Model parameters: d_model 896 ffw_size 3584 kv_size 64 n_heads 14 n_layers 18 Megatron-DeepSpeed/pretrain_gpt.py --tensor-model-parallel-size 1 --pipeline-model-parallel-size 1 --num-layers 18 --hidden-size 896 --num-attention-heads 14 --kv-channels 64 --ffn-hidden-size 3584 --seq-length 2048 --max-position-embeddings 2048 --micro-batch-size 4 --global-batch-size 256 --train-samples 29_492_188 --vocab-file gpt2/vocab.json --merge-file gpt2/merges.txt --clip-grad 1.0 --kill-switch-path kill-switch-221m --bf16 --optimizer adam --adam-beta1 0.9 --adam-beta2 0.999 --adam-eps 1e-8 --lr 2e-4 --min-lr 2e-5 --lr-decay-style cosine --lr-decay-samples 29_492_188 --lr-warmup-samples 294_922 --clip-grad 1.0 --weight-decay 1e-1 --log-interval 10 --save-interval 1000 --eval-interval 1000 --eval-iters 1 --tensorboard-dir tensorboard_221m --tensorboard-queue-size 5 --log-timers-to-tensorboard --log-batch-size-to-tensorboard --log-validation-ppl-to-tensorboard --save checkpoints_221m --load checkpoints_221m --data-path /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document --data-impl mmap --split 949,50,1 --deepspeed --deepspeed_config ds_configs/2077145.json --zero-stage 0 START 2077145: Mon Nov 28 12:52:54 EET 2022 0: 0: 0: ======================= ROCm System Management Interface ======================= 0: ================================= Concise Info ================================= 0: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0: 0 42.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 1 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 2 43.0c 95.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 3 37.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 4 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 5 47.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: 6 43.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 0: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 0: ================================================================================ 0: ============================= End of ROCm SMI Log ============================== 7: 7: 7: ======================= ROCm System Management Interface ======================= 7: ================================= Concise Info ================================= 7: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 7: 0 41.0c 97.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 2 46.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 3 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 4 43.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: 6 42.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 7: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 7: ================================================================================ 7: ============================= End of ROCm SMI Log ============================== 1: 1: 1: ======================= ROCm System Management Interface ======================= 1: ================================= Concise Info ================================= 1: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 1: 0 46.0c 86.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 2 36.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 4 41.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 5 50.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: 6 40.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 1: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 1: ================================================================================ 1: ============================= End of ROCm SMI Log ============================== 2: 2: 2: ======================= ROCm System Management Interface ======================= 2: ================================= Concise Info ================================= 2: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2: 0 47.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 1 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 2 35.0c 92.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 4 46.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 5 40.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: 6 40.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 2: 7 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 2: ================================================================================ 2: ============================= End of ROCm SMI Log ============================== 4: 4: 4: ======================= ROCm System Management Interface ======================= 4: ================================= Concise Info ================================= 4: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 4: 0 42.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 2 43.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 3 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 4 46.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 5 41.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: 6 36.0c 91.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 4: 7 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 4: ================================================================================ 4: ============================= End of ROCm SMI Log ============================== 5: 5: 5: ======================= ROCm System Management Interface ======================= 5: ================================= Concise Info ================================= 5: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 5: 0 41.0c 94.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 1 43.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 2 42.0c 85.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 4 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: 6 39.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 5: 7 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 5: ================================================================================ 5: ============================= End of ROCm SMI Log ============================== 3: 3: 3: ======================= ROCm System Management Interface ======================= 3: ================================= Concise Info ================================= 3: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 3: 0 40.0c 96.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 1 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 2 40.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 3 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 4 39.0c 88.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 5 49.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: 6 43.0c 81.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 3: 7 46.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 3: ================================================================================ 3: ============================= End of ROCm SMI Log ============================== 6: 6: 6: ======================= ROCm System Management Interface ======================= 6: ================================= Concise Info ================================= 6: GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 6: 0 42.0c 89.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 1 45.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 2 44.0c 83.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 3 48.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 4 45.0c 90.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 5 42.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: 6 38.0c 93.0W 800Mhz 1600Mhz 0% auto 560.0W 0% 0% 6: 7 44.0c N/A 800Mhz 1600Mhz 0% auto 0.0W 0% 0% 6: ================================================================================ 6: ============================= End of ROCm SMI Log ============================== 7: Launching on nid005079 (7/8), master nid005072 port 9999, GPUs 8, CUDA: True 6: Launching on nid005078 (6/8), master nid005072 port 9999, GPUs 8, CUDA: True 2: Launching on nid005074 (2/8), master nid005072 port 9999, GPUs 8, CUDA: True 0: Launching on nid005072 (0/8), master nid005072 port 9999, GPUs 8, CUDA: True 3: Launching on nid005075 (3/8), master nid005072 port 9999, GPUs 8, CUDA: True 4: Launching on nid005076 (4/8), master nid005072 port 9999, GPUs 8, CUDA: True 5: Launching on nid005077 (5/8), master nid005072 port 9999, GPUs 8, CUDA: True 1: Launching on nid005073 (1/8), master nid005072 port 9999, GPUs 8, CUDA: True 0: using world size: 64, data-parallel-size: 64, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 0: accumulate and all-reduce gradients in fp32 for bfloat16 data type. 0: using torch.bfloat16 for parameters ... 0: ------------------------ arguments ------------------------ 0: abort_on_unmet_fused_kernel_constraints ......... False 0: accumulate_allreduce_grads_in_fp32 .............. True 0: adam_beta1 ...................................... 0.9 0: adam_beta2 ...................................... 0.999 0: adam_eps ........................................ 1e-08 0: adlr_autoresume ................................. False 0: adlr_autoresume_interval ........................ 1000 0: apply_query_key_layer_scaling ................... True 0: apply_residual_connection_post_layernorm ........ False 0: attention_dropout ............................... 0.1 0: attention_softmax_in_fp32 ....................... False 0: bert_binary_head ................................ True 0: bert_load ....................................... None 0: bf16 ............................................ True 0: bias_dropout_fusion ............................. True 0: bias_gelu_fusion ................................ True 0: biencoder_projection_dim ........................ 0 0: biencoder_shared_query_context_model ............ False 0: block_data_path ................................. None 0: checkpoint_activations .......................... False 0: checkpoint_in_cpu ............................... False 0: checkpoint_num_layers ........................... 1 0: clip_grad ....................................... 1.0 0: codecarbon_dir .................................. None 0: consumed_train_samples .......................... 0 0: consumed_train_tokens ........................... 0 0: consumed_valid_samples .......................... 0 0: contigious_checkpointing ........................ False 0: cpu_optimizer ................................... False 0: cpu_torch_adam .................................. False 0: curriculum_learning ............................. False 0: data_impl ....................................... mmap 0: data_parallel_size .............................. 64 0: data_path ....................................... ['/scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document'] 0: dataloader_type ................................. single 0: DDP_impl ........................................ local 0: decoder_seq_length .............................. None 0: deepscale ....................................... False 0: deepscale_config ................................ None 0: deepspeed ....................................... True 0: deepspeed_activation_checkpointing .............. False 0: deepspeed_config ................................ ds_configs/2077145.json 0: deepspeed_mpi ................................... False 0: distribute_checkpointed_activations ............. False 0: distributed_backend ............................. nccl 0: embed_layernorm ................................. False 0: embedding_path .................................. None 0: encoder_seq_length .............................. 2048 0: eod_mask_loss ................................... False 0: eval_interval ................................... 1000 0: eval_iters ...................................... 1 0: eval_only ....................................... None 0: evidence_data_path .............................. None 0: exit_duration_in_mins ........................... None 0: exit_interval ................................... None 0: ffn_hidden_size ................................. 3584 0: finetune ........................................ False 0: fp16 ............................................ False 0: fp16_lm_cross_entropy ........................... False 0: fp32_residual_connection ........................ False 0: gigaflos_no_embeds .............................. 0 0: global_batch_size ............................... 256 0: glu_activation .................................. None 0: hidden_dropout .................................. 0.1 0: hidden_size ..................................... 896 0: hysteresis ...................................... 2 0: ict_head_size ................................... None 0: ict_load ........................................ None 0: img_dim ......................................... 224 0: indexer_batch_size .............................. 128 0: indexer_log_interval ............................ 1000 0: inference ....................................... False 0: init_method_std ................................. 0.02 0: init_method_xavier_uniform ...................... False 0: initial_loss_scale .............................. 4294967296 0: kill_switch_path ................................ kill-switch-221m 0: kv_channels ..................................... 64 0: layer_norm_fusion ............................... True 0: layernorm_epsilon ............................... 1e-05 0: lazy_mpu_init ................................... None 0: load ............................................ checkpoints_221m 0: local_rank ...................................... None 0: log_batch_size_to_tensorboard ................... True 0: log_interval .................................... 10 0: log_learning_rate_to_tensorboard ................ True 0: log_level ....................................... None 0: log_level_replica ............................... None 0: log_loss_scale_to_tensorboard ................... True 0: log_num_zeros_in_grad ........................... False 0: log_params_norm ................................. False 0: log_path ........................................ None 0: log_timers_to_tensorboard ....................... True 0: log_validation_ppl_to_tensorboard ............... True 0: loss_on_targets_only ............................ False 0: loss_scale ...................................... None 0: loss_scale_window ............................... 1000 0: lr .............................................. 0.0002 0: lr_decay_iters .................................. None 0: lr_decay_samples ................................ 29492188 0: lr_decay_style .................................. cosine 0: lr_decay_tokens ................................. None 0: lr_warmup_fraction .............................. None 0: lr_warmup_iters ................................. 0 0: lr_warmup_samples ............................... 294922 0: make_vocab_size_divisible_by .................... 128 0: mask_prob ....................................... 0.15 0: masked_softmax_fusion ........................... True 0: max_position_embeddings ......................... 2048 0: mean_noise_span_length .......................... None 0: memory_centric_tiled_linear ..................... False 0: merge_file ...................................... gpt2/merges.txt 0: micro_batch_size ................................ 4 0: min_loss_scale .................................. 1.0 0: min_lr .......................................... 2e-05 0: mmap_warmup ..................................... False 0: no_load_optim ................................... None 0: no_load_rng ..................................... None 0: no_save_optim ................................... None 0: no_save_rng ..................................... None 0: noise_density ................................... None 0: num_attention_heads ............................. 14 0: num_channels .................................... 3 0: num_classes ..................................... 1000 0: num_layers ...................................... 18 0: num_layers_per_virtual_pipeline_stage ........... None 0: num_workers ..................................... 2 0: onnx_safe ....................................... None 0: openai_gelu ..................................... False 0: optimizer ....................................... adam 0: optimizer_fusion ................................ True 0: override_lr_scheduler ........................... False 0: pad_vocab_size_to ............................... None 0: params_dtype .................................... torch.bfloat16 0: partition_activations ........................... False 0: patch_dim ....................................... 16 0: pipeline_model_parallel_size .................... 1 0: position_embedding_type ......................... PositionEmbeddingType.absolute 0: pp_partition_method ............................. None 0: profile_backward ................................ False 0: query_in_block_prob ............................. 0.1 0: rampup_batch_size ............................... None 0: rank ............................................ 0 0: remote_device ................................... none 0: reset_attention_mask ............................ False 0: reset_position_ids .............................. False 0: retriever_report_topk_accuracies ................ [] 0: retriever_score_scaling ......................... False 0: retriever_seq_length ............................ 256 0: reweight_loss_based_on_position_frequency ....... False 0: sample_rate ..................................... 1.0 0: save ............................................ checkpoints_221m 0: save_interval ................................... 1000 0: scatter_gather_tensors_in_pipeline .............. True 0: scattered_embeddings ............................ False 0: seed ............................................ 1234 0: seq_length ...................................... 2048 0: sgd_momentum .................................... 0.9 0: short_seq_prob .................................. 0.1 0: skip_train_iteration_range ...................... None 0: split ........................................... 949,50,1 0: split_transformers .............................. False 0: sync_tp_duplicated_parameters ................... False 0: synchronize_each_layer .......................... False 0: tensor_model_parallel_size ...................... 1 0: tensorboard_dir ................................. tensorboard_221m 0: tensorboard_log_interval ........................ 1 0: tensorboard_queue_size .......................... 5 0: test_weighted_split_names ....................... None 0: test_weighted_split_paths ....................... None 0: test_weighted_split_paths_path .................. None 0: test_weighted_split_splits ...................... None 0: test_weighted_split_weights ..................... None 0: tile_factor ..................................... 1 0: titles_data_path ................................ None 0: tokenizer_name_or_path .......................... None 0: tokenizer_type .................................. GPT2BPETokenizer 0: train_iters ..................................... None 0: train_samples ................................... 29492188 0: train_tokens .................................... None 0: train_weighted_split_paths ...................... None 0: train_weighted_split_paths_path ................. None 0: universal_checkpoint ............................ False 0: use_bnb_optimizer ............................... False 0: use_checkpoint_lr_scheduler ..................... False 0: use_contiguous_buffers_in_ddp ................... True 0: use_cpu_initialization .......................... None 0: use_one_sent_docs ............................... False 0: use_pin_memory .................................. False 0: valid_num_workers ............................... 2 0: valid_weighted_split_names ...................... None 0: valid_weighted_split_paths ...................... None 0: valid_weighted_split_paths_path ................. None 0: valid_weighted_split_splits ..................... None 0: valid_weighted_split_weights .................... None 0: virtual_pipeline_model_parallel_size ............ None 0: vocab_extra_ids ................................. 0 0: vocab_file ...................................... gpt2/vocab.json 0: weight_decay .................................... 0.1 0: world_size ...................................... 64 0: zero_allgather_bucket_size ...................... 0.0 0: zero_contigious_gradients ....................... False 0: zero_reduce_bucket_size ......................... 0.0 0: zero_reduce_scatter ............................. False 0: zero_stage ...................................... 0 0: -------------------- end of arguments --------------------- 0: setting number of micro-batches to constant 1 0: > building GPT2BPETokenizer tokenizer ... 0: > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) 0: DeepSpeed general environment info: 0: torch install path ............... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch'] 0: torch version .................... 1.13.0+rocm5.2 0: torch cuda version ............... None 0: torch hip version ................ 5.2.21151-afdc89f8 0: nvcc version ..................... None 0: deepspeed install path ........... ['/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/deepspeed'] 0: deepspeed info ................... 0.7.5, unknown, unknown 0: deepspeed wheel compiled w. ...... torch 1.13, hip 5.1 0: **** Git info for Megatron: git_hash=unknown git_branch=unknown **** 0: > initializing torch distributed ... 0: [2022-11-28 12:53:40,421] [INFO] [comm.py:633:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 7: > setting tensorboard ... 0: > initializing tensor model parallel with size 1 0: > initializing pipeline model parallel with size 1 0: > setting random seeds to 1234 ... 0: > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 0: > compiling dataset index builder ... 0: make: Entering directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: make: Nothing to be done for 'default'. 0: make: Leaving directory '/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/data' 0: >>> done with dataset index builder. Compilation time: 0.091 seconds 0: > compiling and loading fused kernels ... 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.cpp [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 102 0: [1/1] c++ scaled_masked_softmax_hip.cuda.o scaled_masked_softmax_hip.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o scaled_masked_softmax_cuda.so 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_hip_kernel.hip [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/type_shim.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/compat.h [skipped, no changes] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_hip.h [skipped, already hipified] 0: /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h -> /pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_hip.h [skipped, already hipified] 0: Total number of unsupported CUDA function calls: 0 0: 0: 0: Total number of replaced kernel launches: 67 0: [1/1] c++ layer_norm_hip_kernel.cuda.o layer_norm_cuda.o -shared -L/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/venv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_hip -ltorch_cpu -ltorch_hip -ltorch -ltorch_python -L/pfs/lustrep2/projappl/project_462000125/samantao-public/rocm/rocm-5.2.3/lib -lamdhip64 -o fused_mix_prec_layer_norm_cuda.so 0: >>> done with compiling and loading fused kernels. Compilation time: 15.984 seconds 0: time to initialize megatron (seconds): 72.432 0: [after megatron is initialized] datetime: 2022-11-28 12:54:00 0: building GPT model ... 0: [2022-11-28 12:54:00,485] [INFO] [utils.py:827:see_memory_usage] Before Building Model 0: [2022-11-28 12:54:00,486] [INFO] [utils.py:828:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB 0: [2022-11-28 12:54:00,486] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 29.23 GB, percent = 5.8% 0: SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None 0: Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pi 0: pe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=0, data=32, model=0): 32, ProcessCoord(pipe=0, data=33, model=0): 33, ProcessCoord(pipe=0, data=34, model=0): 34, ProcessCoord(pipe=0, data=35, model=0): 35, ProcessCoord(pipe=0, data=36, model=0): 36, ProcessCoord(pipe=0, data=37, model=0): 37, ProcessCoord(pipe=0, data=38, model=0): 38, ProcessCoord(pipe=0, data=39, model=0): 39, ProcessCoord(pipe=0, data=40, model=0): 40, ProcessCoord(pipe=0, data=41, model=0): 41, ProcessCoord(pipe=0, data=42, model=0): 42, ProcessCoord(pipe=0, data=43, model=0): 43, ProcessCoord(pipe=0, data=44, model=0): 44, ProcessCoord(pipe=0, data=45, model=0): 45, ProcessCoord(pipe=0, data=4 0: 6, model=0): 46, ProcessCoord(pipe=0, data=47, model=0): 47, ProcessCoord(pipe=0, data=48, model=0): 48, ProcessCoord(pipe=0, data=49, model=0): 49, ProcessCoord(pipe=0, data=50, model=0): 50, ProcessCoord(pipe=0, data=51, model=0): 51, ProcessCoord(pipe=0, data=52, model=0): 52, ProcessCoord(pipe=0, data=53, model=0): 53, ProcessCoord(pipe=0, data=54, model=0): 54, ProcessCoord(pipe=0, data=55, model=0): 55, ProcessCoord(pipe=0, data=56, model=0): 56, ProcessCoord(pipe=0, data=57, model=0): 57, ProcessCoord(pipe=0, data=58, model=0): 58, ProcessCoord(pipe=0, data=59, model=0): 59, ProcessCoord(pipe=0, data=60, model=0): 60, ProcessCoord(pipe=0, data=61, model=0): 61, ProcessCoord(pipe=0, data=62, model=0): 62, ProcessCoord(pipe=0, data=63, model=0): 63} 0: [2022-11-28 12:54:02,500] [INFO] [module.py:366:_partition_layers] Partitioning pipeline stages with method type:transformer 0: stage=0 layers=25 0: 0: _to_float16 0: 1: EmbeddingPipe 0: 2: 0: 3: ParallelTransformerLayerPipe 0: 4: ParallelTransformerLayerPipe 0: 5: ParallelTransformerLayerPipe 0: 6: ParallelTransformerLayerPipe 0: 7: ParallelTransformerLayerPipe 0: 8: ParallelTransformerLayerPipe 0: 9: ParallelTransformerLayerPipe 0: 10: ParallelTransformerLayerPipe 0: 11: ParallelTransformerLayerPipe 0: 12: ParallelTransformerLayerPipe 0: 13: ParallelTransformerLayerPipe 0: 14: ParallelTransformerLayerPipe 0: 15: ParallelTransformerLayerPipe 0: 16: ParallelTransformerLayerPipe 0: 17: ParallelTransformerLayerPipe 0: 18: ParallelTransformerLayerPipe 0: 19: ParallelTransformerLayerPipe 0: 20: ParallelTransformerLayerPipe 0: 21: undo 0: 22: MixedFusedLayerNorm 0: 23: EmbeddingPipe 0: 24: float16_to_fp32 0: loss: CrossEntropy 0: [2022-11-28 12:54:02,967] [INFO] [utils.py:827:see_memory_usage] After Building Model 0: [2022-11-28 12:54:02,968] [INFO] [utils.py:828:see_memory_usage] MA 0.42 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB 0: [2022-11-28 12:54:02,968] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 29.25 GB, percent = 5.8% 0: setting training iterations to 115203 0: > learning rate decay style: cosine 0: DeepSpeed is enabled. 0: [2022-11-28 12:54:02,970] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.5, git-hash=unknown, git-branch=unknown 0: [2022-11-28 12:54:15,660] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False 0: [2022-11-28 12:54:15,660] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer 0: [2022-11-28 12:54:15,660] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer 0: [2022-11-28 12:54:15,666] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam 0: [2022-11-28 12:54:15,666] [INFO] [logging.py:68:log_dist] [Rank 0] Creating BF16 optimizer 0: [2022-11-28 12:54:15,706] [INFO] [utils.py:827:see_memory_usage] begin bf16_optimizer 0: [2022-11-28 12:54:15,706] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.42 GB CA 0.46 GB Max_CA 0 GB 0: [2022-11-28 12:54:15,706] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 29.93 GB, percent = 5.9% 0: ninja: no work to do. 0: Time to load utils op: 0.20888638496398926 seconds 0: Time to load utils op: 0.2501096725463867 seconds 0: [2022-11-28 12:54:15,949] [INFO] [utils.py:827:see_memory_usage] before initializing group 0 0: [2022-11-28 12:54:15,950] [INFO] [utils.py:828:see_memory_usage] MA 0.41 GB Max_MA 0.41 GB CA 0.46 GB Max_CA 0 GB 0: [2022-11-28 12:54:15,950] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 29.93 GB, percent = 5.9% 0: ninja: no work to do. 0: Time to load utils op: 0.11528730392456055 seconds 5: Time to load utils op: 0.11214923858642578 secondsTime to load utils op: 0.11215400695800781 seconds 5: 5: Time to load utils op: 0.11211991310119629 secondsTime to load utils op: 0.11216878890991211 seconds 5: Time to load utils op: 0.11217188835144043 seconds 5: Time to load utils op: 0.11217737197875977 seconds 5: 5: Time to load utils op: 0.11217093467712402 seconds 5: Time to load utils op: 0.1121821403503418 seconds 6: Time to load utils op: 0.11030459403991699 seconds 6: Time to load utils op: 0.1103200912475586 seconds 6: Time to load utils op: 0.11032533645629883 secondsTime to load utils op: 0.11033153533935547 seconds 6: 6: Time to load utils op: 0.11033844947814941 secondsTime to load utils op: 0.11034846305847168 seconds 6: Time to load utils op: 0.11034798622131348 seconds 6: 6: Time to load utils op: 0.11035561561584473 seconds 1: Time to load utils op: 0.4105093479156494 seconds 0: Time to load utils op: 0.20194292068481445 seconds 0: Time to load utils op: 0.20299601554870605 seconds 0: Time to load utils op: 0.20227932929992676 seconds 0: Time to load utils op: 0.2023622989654541 seconds 0: Time to load utils op: 0.2025904655456543 seconds 1: Time to load utils op: 0.20323777198791504 seconds 1: Time to load utils op: 0.2033863067626953 seconds 1: Time to load utils op: 0.20350193977355957 seconds 1: Time to load utils op: 0.20437073707580566 seconds 1: Time to load utils op: 0.20446109771728516 seconds 1: Time to load utils op: 0.204728364944458 seconds 1: Time to load utils op: 0.20425176620483398 seconds 7: Time to load utils op: 0.2101879119873047 seconds 7: Time to load utils op: 0.2114877700805664 seconds 7: Time to load utils op: 0.2103135585784912 seconds 7: Time to load utils op: 0.21120691299438477 seconds 7: Time to load utils op: 0.20984625816345215 seconds 7: Time to load utils op: 0.21050500869750977 seconds 3: Time to load utils op: 0.21088218688964844 secondsTime to load utils op: 0.21088671684265137 seconds 3: 7: Time to load utils op: 0.21104073524475098 seconds 7: Time to load utils op: 0.2101280689239502 seconds 3: Time to load utils op: 0.21092581748962402 seconds 3: Time to load utils op: 0.21090126037597656 seconds 3: Time to load utils op: 0.21093344688415527 seconds 3: Time to load utils op: 0.2109203338623047 secondsTime to load utils op: 0.21094679832458496 seconds 3: 2: Time to load utils op: 0.21226882934570312 secondsTime to load utils op: 0.2122800350189209 seconds 2: 2: Time to load utils op: 0.21233034133911133 seconds 2: Time to load utils op: 0.21235084533691406 secondsTime to load utils op: 0.21235442161560059 seconds 2: Time to load utils op: 0.21234679222106934 seconds 2: 3: Time to load utils op: 0.21096062660217285 seconds 2: Time to load utils op: 0.2123730182647705 seconds 2: Time to load utils op: 0.21237564086914062 seconds 4: Time to load utils op: 0.21083760261535645 seconds 4: Time to load utils op: 0.2108612060546875 seconds 4: Time to load utils op: 0.21088194847106934 seconds 4: Time to load utils op: 0.21087932586669922 seconds 4: Time to load utils op: 0.21088576316833496 seconds 4: Time to load utils op: 0.210892915725708 seconds 4: Time to load utils op: 0.2108924388885498 seconds 4: Time to load utils op: 0.21090292930603027 seconds 7: Time to load utils op: 0.0005106925964355469 seconds 7: Time to load utils op: 0.0005645751953125 seconds 7: Time to load utils op: 0.0005247592926025391 seconds 7: Time to load utils op: 0.0005753040313720703 seconds 7: Time to load utils op: 0.0005474090576171875 seconds 7: Time to load utils op: 0.0005664825439453125 secondsTime to load utils op: 0.00058746337890625 seconds 7: 7: Time to load utils op: 0.0003437995910644531 seconds 0: Time to load utils op: 0.0006122589111328125 seconds 0: Time to load utils op: 0.0006594657897949219 secondsTime to load utils op: 0.0006558895111083984 seconds 0: 0: Time to load utils op: 0.0006537437438964844 seconds 0: Time to load utils op: 0.0006766319274902344 seconds 0: Time to load utils op: 0.0005581378936767578 seconds 0: Time to load utils op: 0.0007641315460205078 seconds 2: Time to load utils op: 0.0011365413665771484 seconds 4: Time to load utils op: 0.0007832050323486328 seconds 2: Time to load utils op: 0.0014295578002929688 seconds 2: Time to load utils op: 0.0014727115631103516 seconds 2: Time to load utils op: 0.001447916030883789 seconds 2: Time to load utils op: 0.0014584064483642578 seconds 2: Time to load utils op: 0.0014710426330566406 secondsTime to load utils op: 0.0014729499816894531 seconds 2: 2: Time to load utils op: 0.0014774799346923828 seconds 4: Time to load utils op: 0.0011336803436279297 seconds 4: Time to load utils op: 0.0011229515075683594 seconds 4: Time to load utils op: 0.0010995864868164062 secondsTime to load utils op: 0.001140594482421875 seconds 4: 6: Time to load utils op: 0.0009748935699462891 seconds 4: Time to load utils op: 0.0011146068572998047 seconds 4: Time to load utils op: 0.0011985301971435547 seconds 3: Time to load utils op: 0.0008232593536376953 seconds 4: Time to load utils op: 0.0010991096496582031 seconds 6: Time to load utils op: 0.0012063980102539062 seconds 1: Time to load utils op: 0.0005049705505371094 secondsTime to load utils op: 0.0005242824554443359 seconds 1: 3: Time to load utils op: 0.0009481906890869141 seconds 3: Time to load utils op: 0.0009925365447998047 seconds 1: Time to load utils op: 0.0004706382751464844 secondsTime to load utils op: 0.0004432201385498047 secondsTime to load utils op: 0.0004456043243408203 seconds 1: 1: 1: Time to load utils op: 0.00043463706970214844 seconds 1: Time to load utils op: 0.0004305839538574219 seconds 1: Time to load utils op: 0.0004353523254394531 seconds 6: Time to load utils op: 0.0013260841369628906 seconds 3: Time to load utils op: 0.001279592514038086 secondsTime to load utils op: 0.0012731552124023438 seconds 3: 3: Time to load utils op: 0.0012705326080322266 seconds 6: Time to load utils op: 0.0013623237609863281 seconds 3: Time to load utils op: 0.001249074935913086 seconds 6: Time to load utils op: 0.0013458728790283203 seconds 3: Time to load utils op: 0.0012583732604980469 seconds 6: Time to load utils op: 0.001399993896484375 seconds 6: Time to load utils op: 0.001434326171875 seconds 6: Time to load utils op: 0.0014190673828125 seconds 5: Time to load utils op: 0.0009279251098632812 seconds 5: Time to load utils op: 0.0009431838989257812 seconds 5: Time to load utils op: 0.0011992454528808594 seconds 5: Time to load utils op: 0.001127004623413086 seconds 5: Time to load utils op: 0.001146554946899414 seconds 5: Time to load utils op: 0.0012352466583251953 secondsTime to load utils op: 0.0011365413665771484 seconds 5: 5: Time to load utils op: 0.0012683868408203125 seconds 0: [2022-11-28 12:54:16,431] [INFO] [utils.py:827:see_memory_usage] after initializing group 0 0: [2022-11-28 12:54:16,432] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB 0: [2022-11-28 12:54:16,432] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.03 GB, percent = 6.0% 0: [2022-11-28 12:54:16,464] [INFO] [utils.py:827:see_memory_usage] before initializing group 1 0: [2022-11-28 12:54:16,464] [INFO] [utils.py:828:see_memory_usage] MA 0.91 GB Max_MA 0.91 GB CA 1.19 GB Max_CA 1 GB 0: [2022-11-28 12:54:16,464] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,498] [INFO] [utils.py:827:see_memory_usage] after initializing group 1 0: [2022-11-28 12:54:16,498] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,498] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,529] [INFO] [utils.py:827:see_memory_usage] before initializing group 2 0: [2022-11-28 12:54:16,530] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,530] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,563] [INFO] [utils.py:827:see_memory_usage] after initializing group 2 0: [2022-11-28 12:54:16,564] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,564] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,595] [INFO] [utils.py:827:see_memory_usage] before initialize_optimizer 0: [2022-11-28 12:54:16,595] [INFO] [utils.py:828:see_memory_usage] MA 1.25 GB Max_MA 1.25 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,595] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,631] [INFO] [utils.py:827:see_memory_usage] end initialize_optimizer 0: [2022-11-28 12:54:16,632] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,632] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,663] [INFO] [utils.py:827:see_memory_usage] end bf16_optimizer 0: [2022-11-28 12:54:16,663] [INFO] [utils.py:828:see_memory_usage] MA 1.27 GB Max_MA 1.27 GB CA 1.69 GB Max_CA 2 GB 0: [2022-11-28 12:54:16,663] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory: used = 30.08 GB, percent = 6.0% 0: [2022-11-28 12:54:16,664] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam 0: [2022-11-28 12:54:16,664] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler 0: [2022-11-28 12:54:16,664] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = 0: [2022-11-28 12:54:16,664] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: [2022-11-28 12:54:16,664] [INFO] [config.py:1007:print] DeepSpeedEngine configuration: 0: [2022-11-28 12:54:16,664] [INFO] [config.py:1011:print] activation_checkpointing_config { 0: "partition_activations": false, 0: "contiguous_memory_optimization": false, 0: "cpu_checkpointing": false, 0: "number_checkpoints": null, 0: "synchronize_checkpoint_boundary": false, 0: "profile": false 0: } 0: [2022-11-28 12:54:16,664] [INFO] [config.py:1011:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} 0: [2022-11-28 12:54:16,664] [INFO] [config.py:1011:print] amp_enabled .................. False 0: [2022-11-28 12:54:16,664] [INFO] [config.py:1011:print] amp_params ................... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] autotuning_config ............ { 0: "enabled": false, 0: "start_step": null, 0: "end_step": null, 0: "metric_path": null, 0: "arg_mappings": null, 0: "metric": "throughput", 0: "model_info": null, 0: "results_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_results", 0: "exps_dir": "/pfs/lustrep4/scratch/project_462000119/muennighoff/nov-2022-bettercom/autotuning_exps", 0: "overwrite": true, 0: "fast": true, 0: "start_profile_step": 3, 0: "end_profile_step": 5, 0: "tuner_type": "gridsearch", 0: "tuner_early_stopping": 5, 0: "tuner_num_trials": 50, 0: "model_info_path": null, 0: "mp_size": 1, 0: "max_train_batch_size": null, 0: "min_train_batch_size": 1, 0: "max_train_micro_batch_size_per_gpu": 1.024000e+03, 0: "min_train_micro_batch_size_per_gpu": 1, 0: "num_tuning_micro_batch_sizes": 3 0: } 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] bfloat16_enabled ............. True 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] checkpoint_parallel_write_pipeline False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] checkpoint_tag_validation_enabled True 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] checkpoint_tag_validation_fail False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] comms_config ................. 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] communication_data_type ...... None 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_pa 0: rameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] curriculum_enabled ........... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] curriculum_params ............ False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] dataloader_drop_last ......... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] disable_allgather ............ False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] dump_state ................... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] dynamic_loss_scale_args ...... None 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_enabled ........... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_gas_boundary_resolution 1 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_layer_name ........ bert.encoder.layer 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_layer_num ......... 0 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_max_iter .......... 100 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_stability ......... 1e-06 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_tol ............... 0.01 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] eigenvalue_verbose ........... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] elasticity_enabled ........... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] flops_profiler_config ........ { 0: "enabled": false, 0: "profile_step": 1, 0: "module_depth": -1, 0: "top_modules": 1, 0: "detailed": true, 0: "output_file": null 0: } 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] fp16_auto_cast ............... None 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] fp16_enabled ................. False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] fp16_master_weights_and_gradients False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] global_rank .................. 0 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] gradient_accumulation_steps .. 1 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] gradient_clipping ............ 1.0 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] gradient_predivide_factor .... 1.0 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] initial_dynamic_scale ........ 1 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] load_universal_checkpoint .... False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] loss_scale ................... 1.0 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] memory_breakdown ............. False 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] monitor_config ............... 0: [2022-11-28 12:54:16,665] [INFO] [config.py:1011:print] nebula_config ................ { 0: "enabled": false, 0: "persistent_storage_path": null, 0: "persistent_time_interval": 100, 0: "num_of_version_in_retention": 2, 0: "enable_nebula_load": true, 0: "load_path": null 0: } 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] optimizer_legacy_fusion ...... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] optimizer_name ............... None 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] optimizer_params ............. None 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] pld_enabled .................. False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] pld_params ................... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] prescale_gradients ........... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] scheduler_name ............... None 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] scheduler_params ............. None 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] sparse_attention ............. None 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] sparse_gradients_enabled ..... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] steps_per_print .............. 2000 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] train_batch_size ............. 256 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] train_micro_batch_size_per_gpu 4 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] use_node_local_storage ....... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] wall_clock_breakdown ......... False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] world_size ................... 64 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] zero_allow_untested_optimizer False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500000000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50000000 param_persistence_threshold=100000 model_persistence_threshold=9223372036854775807 max_live_parameters=1000000000 max_reuse_distance=1000000000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] zero_enabled ................. False 0: [2022-11-28 12:54:16,666] [INFO] [config.py:1011:print] zero_optimization_stage ...... 0 0: [2022-11-28 12:54:16,666] [INFO] [config.py:996:print_user_config] json = { 0: "train_micro_batch_size_per_gpu": 4, 0: "train_batch_size": 256, 0: "gradient_clipping": 1.0, 0: "zero_optimization": { 0: "stage": 0 0: }, 0: "bf16": { 0: "enabled": true 0: }, 0: "steps_per_print": 2.000000e+03, 0: "wall_clock_breakdown": false 0: } 0: Time to load utils op: 0.0004050731658935547 seconds 0: [2022-11-28 12:54:16,667] [INFO] [engine.py:87:__init__] CONFIG: micro_batches=1 micro_batch_size=4 0: [2022-11-28 12:54:16,718] [INFO] [engine.py:145:__init__] RANK=0 STAGE=0 LAYERS=25 [0, 25) STAGE_PARAMS=220527104 (220.527M) TOTAL_PARAMS=220527104 (220.527M) UNIQUE_PARAMS=220527104 (220.527M) 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: WARNING: could not find the metadata file checkpoints_221m 0: will not load any checkpoints and will start from random 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 4: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 6: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 0: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,727] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 3: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 5: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 2: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 1: [2022-11-28 12:54:16,728] [WARNING] [engine.py:2581:load_checkpoint] Unable to find latest file at checkpoints_221m/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. 7: time (ms) | load-checkpoint: 7.95 0: estimated model parameters: 0.220527104 0: estimated model parameters without embeddings: 0.173619712 0: [after model, optimizer, and learning rate scheduler are built] datetime: 2022-11-28 12:54:16 0: > building train, validation, and test datasets ... 0: > datasets target sizes (minimum size): 0: train: 29492188 0: validation: 29696 0: test: 256 0: > building train, validation, and test datasets for GPT ... 0: > building dataset index ... 0: reading sizes... 0: reading pointers... 0: reading document index... 0: creating numpy buffer of mmap... 0: creating memory view of numpy buffer... 0: > finished creating indexed dataset in 0.007864 seconds 0: number of documents: 210604984 0: > dataset split: 0: train: 0: document indices in [0, 199864130) total of 199864130 documents 0: validation: 0: document indices in [199864130, 210394379) total of 10530249 documents 0: test: 0: document indices in [210394379, 210604984) total of 210605 documents 0: > WARNING: could not find index map files, building the indices on rank 0 ... 0: > only one epoch required, setting separate_last_epoch to False 0: > elasped time to build and save doc-idx mapping (seconds): 14.925816 0: using: 0: number of documents: 199864130 0: number of epochs: 1 0: sequence length: 2048 0: total number of samples: 173377816 0: > elasped time to build and save sample-idx mapping (seconds): 4.163100 0: > building shuffle index with split [0, 173377816) and [173377816, 173377816) ... 0: > elasped time to build and save shuffle-idx mapping (seconds): 10.350663 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_29492188ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_29492188ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_train_indexmap_29492188ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.239 seconds 0: total number of samples: 173377817 0: total number of epochs: 1 0: > WARNING: could not find index map files, building the indices on rank 0 ... 0: > only one epoch required, setting separate_last_epoch to False 0: > elasped time to build and save doc-idx mapping (seconds): 0.485431 0: using: 0: number of documents: 10530249 0: number of epochs: 1 0: sequence length: 2048 0: total number of samples: 9118344 0: > elasped time to build and save sample-idx mapping (seconds): 0.213000 0: > building shuffle index with split [0, 9118344) and [9118344, 9118344) ... 0: > elasped time to build and save shuffle-idx mapping (seconds): 0.271851 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_29696ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_29696ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_valid_indexmap_29696ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.065 seconds 0: total number of samples: 9118345 0: total number of epochs: 1 0: > loading doc-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_doc_idx.npy 0: > loading sample-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_sample_idx.npy 0: > loading shuffle-idx mapping from /scratch/project_462000119/data/pile/megatron_data/meg-gpt2_pile_text_document_test_indexmap_256ns_2048sl_1234s_shuffle_idx.npy 0: loaded indexed file in 0.071 seconds 0: total number of samples: 182928 0: total number of epochs: 1 0: > finished creating GPT datasets ... 0: [after dataloaders are built] datetime: 2022-11-28 12:55:04 0: done with setup ... 0: training ... 0: Number of parameters: [tensor rank - pipeline rank] w/ and w/o embeddings: 7: time (ms) | model-and-optimizer-setup: 16551.72 | train/valid/test-data-iterators-setup: 46992.90 0: [000-000] 0.2205B / 0.1736B 0: [before the start of training step] datetime: 2022-11-28 12:55:04 0: [Rank 0] (after 10 iterations) memory (MB) | allocated: 3312.30078125 | max allocated: 30164.70654296875 | reserved: 30952.0 | max reserved: 30952.0 7: iteration 10/ 115203 | consumed samples: 2560 | consumed tokens: 5242880 | elapsed time per iteration (s): 1.65 | learning rate: 1.736E-06 | global batch size: 256 | lm loss: 1.074227E+01 | grad norm: 23.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 154.960 | TFLOPs: 8.13 | 7: iteration 20/ 115203 | consumed samples: 5120 | consumed tokens: 10485760 | elapsed time per iteration (s): 0.44 | learning rate: 3.472E-06 | global batch size: 256 | lm loss: 9.701524E+00 | grad norm: 4.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.143 | TFLOPs: 30.23 | 7: iteration 30/ 115203 | consumed samples: 7680 | consumed tokens: 15728640 | elapsed time per iteration (s): 0.44 | learning rate: 5.208E-06 | global batch size: 256 | lm loss: 9.151382E+00 | grad norm: 2.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.706 | TFLOPs: 30.73 | 7: iteration 40/ 115203 | consumed samples: 10240 | consumed tokens: 20971520 | elapsed time per iteration (s): 0.43 | learning rate: 6.944E-06 | global batch size: 256 | lm loss: 8.892981E+00 | grad norm: 1.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.740 | TFLOPs: 30.94 | 7: iteration 50/ 115203 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (s): 0.44 | learning rate: 8.680E-06 | global batch size: 256 | lm loss: 8.710005E+00 | grad norm: 1.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.425 | TFLOPs: 30.56 | 7: iteration 60/ 115203 | consumed samples: 15360 | consumed tokens: 31457280 | elapsed time per iteration (s): 0.44 | learning rate: 1.042E-05 | global batch size: 256 | lm loss: 8.588592E+00 | grad norm: 1.892 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.849 | TFLOPs: 30.69 | 7: iteration 70/ 115203 | consumed samples: 17920 | consumed tokens: 36700160 | elapsed time per iteration (s): 0.44 | learning rate: 1.215E-05 | global batch size: 256 | lm loss: 8.417719E+00 | grad norm: 2.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.149 | TFLOPs: 30.65 | 7: iteration 80/ 115203 | consumed samples: 20480 | consumed tokens: 41943040 | elapsed time per iteration (s): 0.45 | learning rate: 1.389E-05 | global batch size: 256 | lm loss: 8.228268E+00 | grad norm: 2.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.268 | TFLOPs: 29.97 | 7: iteration 90/ 115203 | consumed samples: 23040 | consumed tokens: 47185920 | elapsed time per iteration (s): 0.46 | learning rate: 1.562E-05 | global batch size: 256 | lm loss: 8.100198E+00 | grad norm: 1.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.791 | TFLOPs: 29.32 | 7: iteration 100/ 115203 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (s): 0.43 | learning rate: 1.736E-05 | global batch size: 256 | lm loss: 7.931541E+00 | grad norm: 1.911 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.224 | TFLOPs: 30.92 | 7: iteration 110/ 115203 | consumed samples: 28160 | consumed tokens: 57671680 | elapsed time per iteration (s): 0.45 | learning rate: 1.910E-05 | global batch size: 256 | lm loss: 7.754738E+00 | grad norm: 1.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.992 | TFLOPs: 29.96 | 7: iteration 120/ 115203 | consumed samples: 30720 | consumed tokens: 62914560 | elapsed time per iteration (s): 0.44 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 7.579410E+00 | grad norm: 1.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.714 | TFLOPs: 30.73 | 7: iteration 130/ 115203 | consumed samples: 33280 | consumed tokens: 68157440 | elapsed time per iteration (s): 0.43 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 7.370348E+00 | grad norm: 1.565 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.824 | TFLOPs: 31.00 | 7: iteration 140/ 115203 | consumed samples: 35840 | consumed tokens: 73400320 | elapsed time per iteration (s): 0.44 | learning rate: 2.430E-05 | global batch size: 256 | lm loss: 7.223830E+00 | grad norm: 1.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.796 | TFLOPs: 30.84 | 7: iteration 150/ 115203 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (s): 0.46 | learning rate: 2.604E-05 | global batch size: 256 | lm loss: 7.106346E+00 | grad norm: 1.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.139 | TFLOPs: 29.49 | 7: iteration 160/ 115203 | consumed samples: 40960 | consumed tokens: 83886080 | elapsed time per iteration (s): 0.45 | learning rate: 2.778E-05 | global batch size: 256 | lm loss: 6.906526E+00 | grad norm: 2.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.351 | TFLOPs: 29.87 | 7: iteration 170/ 115203 | consumed samples: 43520 | consumed tokens: 89128960 | elapsed time per iteration (s): 0.45 | learning rate: 2.951E-05 | global batch size: 256 | lm loss: 6.846246E+00 | grad norm: 2.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.857 | TFLOPs: 30.06 | 7: iteration 180/ 115203 | consumed samples: 46080 | consumed tokens: 94371840 | elapsed time per iteration (s): 0.44 | learning rate: 3.125E-05 | global batch size: 256 | lm loss: 6.717236E+00 | grad norm: 1.657 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.277 | TFLOPs: 30.87 | 7: iteration 190/ 115203 | consumed samples: 48640 | consumed tokens: 99614720 | elapsed time per iteration (s): 0.44 | learning rate: 3.298E-05 | global batch size: 256 | lm loss: 6.642082E+00 | grad norm: 1.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.967 | TFLOPs: 30.53 | 7: iteration 200/ 115203 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (s): 0.43 | learning rate: 3.472E-05 | global batch size: 256 | lm loss: 6.531606E+00 | grad norm: 1.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.703 | TFLOPs: 31.20 | 7: iteration 210/ 115203 | consumed samples: 53760 | consumed tokens: 110100480 | elapsed time per iteration (s): 0.43 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 6.457999E+00 | grad norm: 1.836 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | 7: iteration 220/ 115203 | consumed samples: 56320 | consumed tokens: 115343360 | elapsed time per iteration (s): 0.44 | learning rate: 3.819E-05 | global batch size: 256 | lm loss: 6.374171E+00 | grad norm: 1.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.705 | TFLOPs: 30.57 | 7: iteration 230/ 115203 | consumed samples: 58880 | consumed tokens: 120586240 | elapsed time per iteration (s): 0.43 | learning rate: 3.993E-05 | global batch size: 256 | lm loss: 6.330659E+00 | grad norm: 1.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.028 | TFLOPs: 30.96 | 7: iteration 240/ 115203 | consumed samples: 61440 | consumed tokens: 125829120 | elapsed time per iteration (s): 0.44 | learning rate: 4.167E-05 | global batch size: 256 | lm loss: 6.297150E+00 | grad norm: 1.776 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.377 | TFLOPs: 30.87 | 7: iteration 250/ 115203 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (s): 0.44 | learning rate: 4.340E-05 | global batch size: 256 | lm loss: 6.199382E+00 | grad norm: 1.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.365 | TFLOPs: 30.82 | 7: iteration 260/ 115203 | consumed samples: 66560 | consumed tokens: 136314880 | elapsed time per iteration (s): 0.44 | learning rate: 4.514E-05 | global batch size: 256 | lm loss: 6.172031E+00 | grad norm: 2.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.773 | TFLOPs: 30.63 | 7: iteration 270/ 115203 | consumed samples: 69120 | consumed tokens: 141557760 | elapsed time per iteration (s): 0.44 | learning rate: 4.687E-05 | global batch size: 256 | lm loss: 6.092637E+00 | grad norm: 1.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.517 | TFLOPs: 30.35 | 7: iteration 280/ 115203 | consumed samples: 71680 | consumed tokens: 146800640 | elapsed time per iteration (s): 0.45 | learning rate: 4.861E-05 | global batch size: 256 | lm loss: 6.059785E+00 | grad norm: 1.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.340 | TFLOPs: 29.92 | 7: iteration 290/ 115203 | consumed samples: 74240 | consumed tokens: 152043520 | elapsed time per iteration (s): 0.43 | learning rate: 5.035E-05 | global batch size: 256 | lm loss: 6.001253E+00 | grad norm: 1.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.688 | TFLOPs: 31.20 | 7: iteration 300/ 115203 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (s): 0.43 | learning rate: 5.208E-05 | global batch size: 256 | lm loss: 5.917122E+00 | grad norm: 2.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.905 | TFLOPs: 30.95 | 7: iteration 310/ 115203 | consumed samples: 79360 | consumed tokens: 162529280 | elapsed time per iteration (s): 0.44 | learning rate: 5.382E-05 | global batch size: 256 | lm loss: 5.927438E+00 | grad norm: 1.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.556 | TFLOPs: 30.25 | 7: iteration 320/ 115203 | consumed samples: 81920 | consumed tokens: 167772160 | elapsed time per iteration (s): 0.44 | learning rate: 5.555E-05 | global batch size: 256 | lm loss: 5.894496E+00 | grad norm: 1.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.657 | TFLOPs: 30.52 | 7: iteration 330/ 115203 | consumed samples: 84480 | consumed tokens: 173015040 | elapsed time per iteration (s): 0.45 | learning rate: 5.729E-05 | global batch size: 256 | lm loss: 5.849720E+00 | grad norm: 1.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.325 | TFLOPs: 29.56 | 7: iteration 340/ 115203 | consumed samples: 87040 | consumed tokens: 178257920 | elapsed time per iteration (s): 0.45 | learning rate: 5.903E-05 | global batch size: 256 | lm loss: 5.812771E+00 | grad norm: 1.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.696 | TFLOPs: 29.89 | 7: iteration 350/ 115203 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (s): 0.44 | learning rate: 6.076E-05 | global batch size: 256 | lm loss: 5.765213E+00 | grad norm: 1.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.586 | TFLOPs: 30.67 | 7: iteration 360/ 115203 | consumed samples: 92160 | consumed tokens: 188743680 | elapsed time per iteration (s): 0.45 | learning rate: 6.250E-05 | global batch size: 256 | lm loss: 5.723219E+00 | grad norm: 2.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.902 | TFLOPs: 29.69 | 7: iteration 370/ 115203 | consumed samples: 94720 | consumed tokens: 193986560 | elapsed time per iteration (s): 0.44 | learning rate: 6.423E-05 | global batch size: 256 | lm loss: 5.649506E+00 | grad norm: 1.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.135 | TFLOPs: 30.54 | 7: iteration 380/ 115203 | consumed samples: 97280 | consumed tokens: 199229440 | elapsed time per iteration (s): 0.45 | learning rate: 6.597E-05 | global batch size: 256 | lm loss: 5.652365E+00 | grad norm: 2.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.571 | TFLOPs: 29.57 | 7: iteration 390/ 115203 | consumed samples: 99840 | consumed tokens: 204472320 | elapsed time per iteration (s): 0.44 | learning rate: 6.771E-05 | global batch size: 256 | lm loss: 5.593541E+00 | grad norm: 1.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.497 | TFLOPs: 30.20 | 7: iteration 400/ 115203 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (s): 0.45 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 5.597430E+00 | grad norm: 1.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.855 | TFLOPs: 30.00 | 7: iteration 410/ 115203 | consumed samples: 104960 | consumed tokens: 214958080 | elapsed time per iteration (s): 0.43 | learning rate: 7.118E-05 | global batch size: 256 | lm loss: 5.560117E+00 | grad norm: 2.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.319 | TFLOPs: 31.03 | 7: iteration 420/ 115203 | consumed samples: 107520 | consumed tokens: 220200960 | elapsed time per iteration (s): 0.44 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 5.589679E+00 | grad norm: 1.648 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.815 | TFLOPs: 30.42 | 7: iteration 430/ 115203 | consumed samples: 110080 | consumed tokens: 225443840 | elapsed time per iteration (s): 0.44 | learning rate: 7.465E-05 | global batch size: 256 | lm loss: 5.507290E+00 | grad norm: 2.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.336 | TFLOPs: 30.55 | 7: iteration 440/ 115203 | consumed samples: 112640 | consumed tokens: 230686720 | elapsed time per iteration (s): 0.44 | learning rate: 7.639E-05 | global batch size: 256 | lm loss: 5.468211E+00 | grad norm: 1.828 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.597 | TFLOPs: 30.73 | 7: iteration 450/ 115203 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (s): 0.44 | learning rate: 7.812E-05 | global batch size: 256 | lm loss: 5.420924E+00 | grad norm: 1.497 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.658 | TFLOPs: 30.57 | 7: iteration 460/ 115203 | consumed samples: 117760 | consumed tokens: 241172480 | elapsed time per iteration (s): 0.44 | learning rate: 7.986E-05 | global batch size: 256 | lm loss: 5.427560E+00 | grad norm: 1.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.737 | TFLOPs: 30.84 | 7: iteration 470/ 115203 | consumed samples: 120320 | consumed tokens: 246415360 | elapsed time per iteration (s): 0.43 | learning rate: 8.159E-05 | global batch size: 256 | lm loss: 5.399146E+00 | grad norm: 1.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.227 | TFLOPs: 31.18 | 7: iteration 480/ 115203 | consumed samples: 122880 | consumed tokens: 251658240 | elapsed time per iteration (s): 0.44 | learning rate: 8.333E-05 | global batch size: 256 | lm loss: 5.378523E+00 | grad norm: 1.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.565 | TFLOPs: 30.20 | 7: iteration 490/ 115203 | consumed samples: 125440 | consumed tokens: 256901120 | elapsed time per iteration (s): 0.44 | learning rate: 8.507E-05 | global batch size: 256 | lm loss: 5.313163E+00 | grad norm: 1.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.484 | TFLOPs: 30.72 | 7: iteration 500/ 115203 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (s): 0.43 | learning rate: 8.680E-05 | global batch size: 256 | lm loss: 5.297494E+00 | grad norm: 1.867 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.728 | TFLOPs: 31.10 | 7: iteration 510/ 115203 | consumed samples: 130560 | consumed tokens: 267386880 | elapsed time per iteration (s): 0.43 | learning rate: 8.854E-05 | global batch size: 256 | lm loss: 5.280753E+00 | grad norm: 1.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.133 | TFLOPs: 31.17 | 7: iteration 520/ 115203 | consumed samples: 133120 | consumed tokens: 272629760 | elapsed time per iteration (s): 0.44 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 5.256369E+00 | grad norm: 1.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.798 | TFLOPs: 30.68 | 7: iteration 530/ 115203 | consumed samples: 135680 | consumed tokens: 277872640 | elapsed time per iteration (s): 0.43 | learning rate: 9.201E-05 | global batch size: 256 | lm loss: 5.189791E+00 | grad norm: 2.021 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.738 | TFLOPs: 30.94 | 7: iteration 540/ 115203 | consumed samples: 138240 | consumed tokens: 283115520 | elapsed time per iteration (s): 0.44 | learning rate: 9.375E-05 | global batch size: 256 | lm loss: 5.241899E+00 | grad norm: 1.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.768 | TFLOPs: 30.63 | 7: iteration 550/ 115203 | consumed samples: 140800 | consumed tokens: 288358400 | elapsed time per iteration (s): 0.43 | learning rate: 9.548E-05 | global batch size: 256 | lm loss: 5.216714E+00 | grad norm: 1.698 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.377 | TFLOPs: 30.92 | 7: iteration 560/ 115203 | consumed samples: 143360 | consumed tokens: 293601280 | elapsed time per iteration (s): 0.44 | learning rate: 9.722E-05 | global batch size: 256 | lm loss: 5.221610E+00 | grad norm: 1.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.682 | TFLOPs: 30.78 | 7: iteration 570/ 115203 | consumed samples: 145920 | consumed tokens: 298844160 | elapsed time per iteration (s): 0.45 | learning rate: 9.895E-05 | global batch size: 256 | lm loss: 5.174030E+00 | grad norm: 1.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.264 | TFLOPs: 30.13 | 7: iteration 580/ 115203 | consumed samples: 148480 | consumed tokens: 304087040 | elapsed time per iteration (s): 0.44 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 5.124775E+00 | grad norm: 1.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.341 | TFLOPs: 30.71 | 7: iteration 590/ 115203 | consumed samples: 151040 | consumed tokens: 309329920 | elapsed time per iteration (s): 0.44 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 5.074921E+00 | grad norm: 1.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.344 | TFLOPs: 30.82 | 7: iteration 600/ 115203 | consumed samples: 153600 | consumed tokens: 314572800 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 5.069079E+00 | grad norm: 2.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.435 | TFLOPs: 31.03 | 7: iteration 610/ 115203 | consumed samples: 156160 | consumed tokens: 319815680 | elapsed time per iteration (s): 0.44 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 5.057368E+00 | grad norm: 1.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.423 | TFLOPs: 30.40 | 7: iteration 620/ 115203 | consumed samples: 158720 | consumed tokens: 325058560 | elapsed time per iteration (s): 0.43 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 5.002641E+00 | grad norm: 1.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.829 | TFLOPs: 31.00 | 7: iteration 630/ 115203 | consumed samples: 161280 | consumed tokens: 330301440 | elapsed time per iteration (s): 0.44 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 5.081485E+00 | grad norm: 1.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.660 | TFLOPs: 30.62 | 7: iteration 640/ 115203 | consumed samples: 163840 | consumed tokens: 335544320 | elapsed time per iteration (s): 0.44 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 4.986636E+00 | grad norm: 1.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.213 | TFLOPs: 30.29 | 7: iteration 650/ 115203 | consumed samples: 166400 | consumed tokens: 340787200 | elapsed time per iteration (s): 0.44 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 5.008085E+00 | grad norm: 1.422 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.044 | TFLOPs: 30.54 | 7: iteration 660/ 115203 | consumed samples: 168960 | consumed tokens: 346030080 | elapsed time per iteration (s): 0.44 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 4.952942E+00 | grad norm: 1.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.197 | TFLOPs: 30.28 | 7: iteration 670/ 115203 | consumed samples: 171520 | consumed tokens: 351272960 | elapsed time per iteration (s): 0.45 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 4.899878E+00 | grad norm: 1.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.968 | TFLOPs: 29.85 | 7: iteration 680/ 115203 | consumed samples: 174080 | consumed tokens: 356515840 | elapsed time per iteration (s): 0.44 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 4.912078E+00 | grad norm: 1.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.539 | TFLOPs: 30.72 | 7: iteration 690/ 115203 | consumed samples: 176640 | consumed tokens: 361758720 | elapsed time per iteration (s): 0.44 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 4.893460E+00 | grad norm: 1.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.215 | TFLOPs: 30.29 | 7: iteration 700/ 115203 | consumed samples: 179200 | consumed tokens: 367001600 | elapsed time per iteration (s): 0.44 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 4.867017E+00 | grad norm: 1.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.816 | TFLOPs: 30.74 | 7: iteration 710/ 115203 | consumed samples: 181760 | consumed tokens: 372244480 | elapsed time per iteration (s): 0.45 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 4.856622E+00 | grad norm: 1.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.365 | TFLOPs: 30.14 | 7: iteration 720/ 115203 | consumed samples: 184320 | consumed tokens: 377487360 | elapsed time per iteration (s): 0.43 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 4.866807E+00 | grad norm: 1.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.329 | TFLOPs: 31.45 | 7: iteration 730/ 115203 | consumed samples: 186880 | consumed tokens: 382730240 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 4.830151E+00 | grad norm: 1.439 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.949 | TFLOPs: 31.06 | 7: iteration 740/ 115203 | consumed samples: 189440 | consumed tokens: 387973120 | elapsed time per iteration (s): 0.45 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 4.800500E+00 | grad norm: 1.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.549 | TFLOPs: 30.15 | 7: iteration 750/ 115203 | consumed samples: 192000 | consumed tokens: 393216000 | elapsed time per iteration (s): 0.44 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 4.742624E+00 | grad norm: 1.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.089 | TFLOPs: 30.44 | 7: iteration 760/ 115203 | consumed samples: 194560 | consumed tokens: 398458880 | elapsed time per iteration (s): 0.44 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 4.769480E+00 | grad norm: 1.127 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.040 | TFLOPs: 30.59 | 7: iteration 770/ 115203 | consumed samples: 197120 | consumed tokens: 403701760 | elapsed time per iteration (s): 0.43 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 4.763821E+00 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.847 | TFLOPs: 31.11 | 7: iteration 780/ 115203 | consumed samples: 199680 | consumed tokens: 408944640 | elapsed time per iteration (s): 0.44 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 4.702056E+00 | grad norm: 1.316 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.318 | TFLOPs: 30.66 | 7: iteration 790/ 115203 | consumed samples: 202240 | consumed tokens: 414187520 | elapsed time per iteration (s): 0.44 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 4.674795E+00 | grad norm: 1.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.234 | TFLOPs: 30.23 | 7: iteration 800/ 115203 | consumed samples: 204800 | consumed tokens: 419430400 | elapsed time per iteration (s): 0.45 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 4.658162E+00 | grad norm: 1.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.579 | TFLOPs: 29.88 | 7: iteration 810/ 115203 | consumed samples: 207360 | consumed tokens: 424673280 | elapsed time per iteration (s): 0.43 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 4.610523E+00 | grad norm: 1.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.287 | TFLOPs: 30.92 | 7: iteration 820/ 115203 | consumed samples: 209920 | consumed tokens: 429916160 | elapsed time per iteration (s): 0.44 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 4.629405E+00 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.943 | TFLOPs: 30.43 | 7: iteration 830/ 115203 | consumed samples: 212480 | consumed tokens: 435159040 | elapsed time per iteration (s): 0.44 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 4.631646E+00 | grad norm: 1.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.433 | TFLOPs: 30.66 | 7: iteration 840/ 115203 | consumed samples: 215040 | consumed tokens: 440401920 | elapsed time per iteration (s): 0.43 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 4.555791E+00 | grad norm: 1.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.123 | TFLOPs: 31.33 | 7: iteration 850/ 115203 | consumed samples: 217600 | consumed tokens: 445644800 | elapsed time per iteration (s): 0.44 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 4.545301E+00 | grad norm: 1.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.959 | TFLOPs: 30.85 | 7: iteration 860/ 115203 | consumed samples: 220160 | consumed tokens: 450887680 | elapsed time per iteration (s): 0.44 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 4.523293E+00 | grad norm: 1.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.885 | TFLOPs: 30.74 | 7: iteration 870/ 115203 | consumed samples: 222720 | consumed tokens: 456130560 | elapsed time per iteration (s): 0.44 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 4.538533E+00 | grad norm: 1.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.646 | TFLOPs: 30.57 | 7: iteration 880/ 115203 | consumed samples: 225280 | consumed tokens: 461373440 | elapsed time per iteration (s): 0.44 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 4.511045E+00 | grad norm: 1.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.080 | TFLOPs: 30.49 | 7: iteration 890/ 115203 | consumed samples: 227840 | consumed tokens: 466616320 | elapsed time per iteration (s): 0.44 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 4.446133E+00 | grad norm: 1.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.132 | TFLOPs: 30.60 | 7: iteration 900/ 115203 | consumed samples: 230400 | consumed tokens: 471859200 | elapsed time per iteration (s): 0.43 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 4.419344E+00 | grad norm: 1.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.225 | TFLOPs: 31.44 | 7: iteration 910/ 115203 | consumed samples: 232960 | consumed tokens: 477102080 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 4.419588E+00 | grad norm: 1.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.542 | TFLOPs: 30.88 | 7: iteration 920/ 115203 | consumed samples: 235520 | consumed tokens: 482344960 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 4.402967E+00 | grad norm: 1.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.384 | TFLOPs: 31.29 | 7: iteration 930/ 115203 | consumed samples: 238080 | consumed tokens: 487587840 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 4.307558E+00 | grad norm: 1.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.225 | TFLOPs: 31.02 | 7: iteration 940/ 115203 | consumed samples: 240640 | consumed tokens: 492830720 | elapsed time per iteration (s): 0.44 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 4.278138E+00 | grad norm: 1.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.430 | TFLOPs: 30.35 | 7: iteration 950/ 115203 | consumed samples: 243200 | consumed tokens: 498073600 | elapsed time per iteration (s): 0.43 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 4.282418E+00 | grad norm: 1.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.953 | TFLOPs: 30.95 | 7: iteration 960/ 115203 | consumed samples: 245760 | consumed tokens: 503316480 | elapsed time per iteration (s): 0.44 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 4.229281E+00 | grad norm: 1.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.723 | TFLOPs: 30.63 | 7: iteration 970/ 115203 | consumed samples: 248320 | consumed tokens: 508559360 | elapsed time per iteration (s): 0.44 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 4.200276E+00 | grad norm: 1.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.850 | TFLOPs: 30.74 | 7: iteration 980/ 115203 | consumed samples: 250880 | consumed tokens: 513802240 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 4.123679E+00 | grad norm: 1.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.239 | TFLOPs: 31.07 | 7: iteration 990/ 115203 | consumed samples: 253440 | consumed tokens: 519045120 | elapsed time per iteration (s): 0.44 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 4.133325E+00 | grad norm: 1.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.941 | TFLOPs: 30.38 | 7: iteration 1000/ 115203 | consumed samples: 256000 | consumed tokens: 524288000 | elapsed time per iteration (s): 0.43 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 4.075450E+00 | grad norm: 1.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.760 | TFLOPs: 31.36 | 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 1000 | lm loss value: 3.967811E+00 | lm loss PPL: 5.286869E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 1000 to checkpoints_221m 0: [2022-11-28 13:02:35,557] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step1000 is begin to save! 0: [2022-11-28 13:02:35,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:02:35,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:02:35,795] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:02:35,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:02:35,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:02:35,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:02:35,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:02:35,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:02:35,864] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:02:35,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:02:35,887] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:02:35,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:02:35,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:02:35,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:02:35,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:02:35,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:02:35,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:02:35,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:02:35,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:02:36,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:02:36,005] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:02:36,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:02:36,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:02:36,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:02:36,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:02:36,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:02:36,078] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:02:36,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:02:36,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:02:36,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:02:36,125] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:02:36,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:02:36,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:02:36,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:02:36,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:02:36,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:02:36,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:02:36,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:02:36,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:02:36,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:02:36,223] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step1000/mp_rank_00_model_states.pt 0: [2022-11-28 13:02:36,223] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:02:36,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:02:36,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:02:36,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,306] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,335] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,335] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:02:36,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:02:36,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 6: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 4: [2022-11-28 13:02:36,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:02:36,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:02:36,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 7: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:02:36,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 7: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 5: [2022-11-28 13:02:36,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:02:36,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 2: [2022-11-28 13:02:36,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:02:36,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:02:36,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 1: [2022-11-28 13:02:36,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: [2022-11-28 13:02:36,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:02:36,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step1000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 3: [2022-11-28 13:02:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step1000 is ready now! 0: successfully saved checkpoint at iteration 1000 to checkpoints_221m 7: time (ms) | save-checkpoint: 844.91 7: iteration 1010/ 115203 | consumed samples: 258560 | consumed tokens: 529530880 | elapsed time per iteration (s): 0.53 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 4.086060E+00 | grad norm: 1.531 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 479.868 | TFLOPs: 25.18 | 7: iteration 1020/ 115203 | consumed samples: 261120 | consumed tokens: 534773760 | elapsed time per iteration (s): 0.46 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 4.042358E+00 | grad norm: 1.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.213 | TFLOPs: 29.34 | 7: iteration 1030/ 115203 | consumed samples: 263680 | consumed tokens: 540016640 | elapsed time per iteration (s): 0.45 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 4.004114E+00 | grad norm: 1.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.073 | TFLOPs: 29.54 | 7: iteration 1040/ 115203 | consumed samples: 266240 | consumed tokens: 545259520 | elapsed time per iteration (s): 0.44 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 4.041526E+00 | grad norm: 1.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.002 | TFLOPs: 30.33 | 7: iteration 1050/ 115203 | consumed samples: 268800 | consumed tokens: 550502400 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 3.993377E+00 | grad norm: 1.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.238 | TFLOPs: 31.34 | 7: iteration 1060/ 115203 | consumed samples: 271360 | consumed tokens: 555745280 | elapsed time per iteration (s): 0.44 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 3.940558E+00 | grad norm: 1.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.024 | TFLOPs: 30.59 | 7: iteration 1070/ 115203 | consumed samples: 273920 | consumed tokens: 560988160 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 3.918330E+00 | grad norm: 1.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.846 | TFLOPs: 31.37 | 7: iteration 1080/ 115203 | consumed samples: 276480 | consumed tokens: 566231040 | elapsed time per iteration (s): 0.44 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 3.842963E+00 | grad norm: 0.970 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.081 | TFLOPs: 30.86 | 7: iteration 1090/ 115203 | consumed samples: 279040 | consumed tokens: 571473920 | elapsed time per iteration (s): 0.44 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 3.864426E+00 | grad norm: 1.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.917 | TFLOPs: 30.58 | 7: iteration 1100/ 115203 | consumed samples: 281600 | consumed tokens: 576716800 | elapsed time per iteration (s): 0.43 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 3.878296E+00 | grad norm: 1.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.178 | TFLOPs: 31.23 | 7: iteration 1110/ 115203 | consumed samples: 284160 | consumed tokens: 581959680 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 3.820939E+00 | grad norm: 0.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.332 | TFLOPs: 31.24 | 7: iteration 1120/ 115203 | consumed samples: 286720 | consumed tokens: 587202560 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 3.842894E+00 | grad norm: 1.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.841 | TFLOPs: 30.95 | 7: iteration 1130/ 115203 | consumed samples: 289280 | consumed tokens: 592445440 | elapsed time per iteration (s): 0.44 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 3.829042E+00 | grad norm: 1.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.843 | TFLOPs: 30.48 | 7: iteration 1140/ 115203 | consumed samples: 291840 | consumed tokens: 597688320 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 3.767992E+00 | grad norm: 1.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.466 | TFLOPs: 30.93 | 7: iteration 1150/ 115203 | consumed samples: 294400 | consumed tokens: 602931200 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 3.795569E+00 | grad norm: 1.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.013 | TFLOPs: 30.59 | 7: iteration 1160/ 115203 | consumed samples: 296960 | consumed tokens: 608174080 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.767725E+00 | grad norm: 1.050 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.934 | TFLOPs: 31.27 | 7: iteration 1170/ 115203 | consumed samples: 299520 | consumed tokens: 613416960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.752434E+00 | grad norm: 0.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.956 | TFLOPs: 30.90 | 7: iteration 1180/ 115203 | consumed samples: 302080 | consumed tokens: 618659840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.753293E+00 | grad norm: 0.972 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.128 | TFLOPs: 31.12 | 7: iteration 1190/ 115203 | consumed samples: 304640 | consumed tokens: 623902720 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.746846E+00 | grad norm: 0.823 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.008 | TFLOPs: 30.59 | 7: iteration 1200/ 115203 | consumed samples: 307200 | consumed tokens: 629145600 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.703574E+00 | grad norm: 1.079 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.303 | TFLOPs: 30.66 | 7: iteration 1210/ 115203 | consumed samples: 309760 | consumed tokens: 634388480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.683684E+00 | grad norm: 0.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.092 | TFLOPs: 31.28 | 7: iteration 1220/ 115203 | consumed samples: 312320 | consumed tokens: 639631360 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.607923E+00 | grad norm: 0.859 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.782 | TFLOPs: 29.69 | 7: iteration 1230/ 115203 | consumed samples: 314880 | consumed tokens: 644874240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.659595E+00 | grad norm: 0.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.376 | TFLOPs: 31.50 | 7: iteration 1240/ 115203 | consumed samples: 317440 | consumed tokens: 650117120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.625873E+00 | grad norm: 0.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.571 | TFLOPs: 31.20 | 7: iteration 1250/ 115203 | consumed samples: 320000 | consumed tokens: 655360000 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.614579E+00 | grad norm: 0.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.827 | TFLOPs: 30.21 | 7: iteration 1260/ 115203 | consumed samples: 322560 | consumed tokens: 660602880 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.544589E+00 | grad norm: 0.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.734 | TFLOPs: 31.05 | 7: iteration 1270/ 115203 | consumed samples: 325120 | consumed tokens: 665845760 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.602885E+00 | grad norm: 0.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.763 | TFLOPs: 30.84 | 7: iteration 1280/ 115203 | consumed samples: 327680 | consumed tokens: 671088640 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.596269E+00 | grad norm: 0.675 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.296 | TFLOPs: 29.61 | 7: iteration 1290/ 115203 | consumed samples: 330240 | consumed tokens: 676331520 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.462552E+00 | grad norm: 0.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.251 | TFLOPs: 30.76 | 7: iteration 1300/ 115203 | consumed samples: 332800 | consumed tokens: 681574400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.544347E+00 | grad norm: 0.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.901 | TFLOPs: 31.11 | 7: iteration 1310/ 115203 | consumed samples: 335360 | consumed tokens: 686817280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.521351E+00 | grad norm: 0.841 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.661 | TFLOPs: 31.52 | 7: iteration 1320/ 115203 | consumed samples: 337920 | consumed tokens: 692060160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.513210E+00 | grad norm: 0.804 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.459 | TFLOPs: 31.14 | 7: iteration 1330/ 115203 | consumed samples: 340480 | consumed tokens: 697303040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.510418E+00 | grad norm: 0.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.227 | TFLOPs: 31.07 | 7: iteration 1340/ 115203 | consumed samples: 343040 | consumed tokens: 702545920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.514294E+00 | grad norm: 0.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.804 | TFLOPs: 31.37 | 7: iteration 1350/ 115203 | consumed samples: 345600 | consumed tokens: 707788800 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.487249E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.169 | TFLOPs: 30.76 | 7: iteration 1360/ 115203 | consumed samples: 348160 | consumed tokens: 713031680 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.439154E+00 | grad norm: 0.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.710 | TFLOPs: 30.73 | 7: iteration 1370/ 115203 | consumed samples: 350720 | consumed tokens: 718274560 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.478865E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.308 | TFLOPs: 30.19 | 7: iteration 1380/ 115203 | consumed samples: 353280 | consumed tokens: 723517440 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.476558E+00 | grad norm: 0.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.651 | TFLOPs: 31.62 | 7: iteration 1390/ 115203 | consumed samples: 355840 | consumed tokens: 728760320 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.465014E+00 | grad norm: 0.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.852 | TFLOPs: 30.27 | 7: iteration 1400/ 115203 | consumed samples: 358400 | consumed tokens: 734003200 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.504227E+00 | grad norm: 0.722 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.042 | TFLOPs: 31.48 | 7: iteration 1410/ 115203 | consumed samples: 360960 | consumed tokens: 739246080 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.455534E+00 | grad norm: 0.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.404 | TFLOPs: 30.98 | 7: iteration 1420/ 115203 | consumed samples: 363520 | consumed tokens: 744488960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.402195E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.830 | TFLOPs: 31.16 | 7: iteration 1430/ 115203 | consumed samples: 366080 | consumed tokens: 749731840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.399473E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.494 | TFLOPs: 31.35 | 7: iteration 1440/ 115203 | consumed samples: 368640 | consumed tokens: 754974720 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.427173E+00 | grad norm: 0.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.726 | TFLOPs: 30.36 | 7: iteration 1450/ 115203 | consumed samples: 371200 | consumed tokens: 760217600 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.409597E+00 | grad norm: 0.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.838 | TFLOPs: 30.79 | 7: iteration 1460/ 115203 | consumed samples: 373760 | consumed tokens: 765460480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.418626E+00 | grad norm: 0.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.007 | TFLOPs: 31.06 | 7: iteration 1470/ 115203 | consumed samples: 376320 | consumed tokens: 770703360 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.381452E+00 | grad norm: 0.756 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.353 | TFLOPs: 30.82 | 7: iteration 1480/ 115203 | consumed samples: 378880 | consumed tokens: 775946240 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.402695E+00 | grad norm: 0.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.481 | TFLOPs: 31.77 | 7: iteration 1490/ 115203 | consumed samples: 381440 | consumed tokens: 781189120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.416734E+00 | grad norm: 0.699 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.531 | TFLOPs: 31.19 | 7: iteration 1500/ 115203 | consumed samples: 384000 | consumed tokens: 786432000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.384963E+00 | grad norm: 0.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.758 | TFLOPs: 31.10 | 7: iteration 1510/ 115203 | consumed samples: 386560 | consumed tokens: 791674880 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.387514E+00 | grad norm: 0.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.467 | TFLOPs: 31.14 | 7: iteration 1520/ 115203 | consumed samples: 389120 | consumed tokens: 796917760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.385332E+00 | grad norm: 0.903 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.645 | TFLOPs: 31.04 | 7: iteration 1530/ 115203 | consumed samples: 391680 | consumed tokens: 802160640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.366223E+00 | grad norm: 0.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.675 | TFLOPs: 31.04 | 7: iteration 1540/ 115203 | consumed samples: 394240 | consumed tokens: 807403520 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.375396E+00 | grad norm: 0.755 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.311 | TFLOPs: 30.55 | 7: iteration 1550/ 115203 | consumed samples: 396800 | consumed tokens: 812646400 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.341777E+00 | grad norm: 0.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.126 | TFLOPs: 31.91 | 7: iteration 1560/ 115203 | consumed samples: 399360 | consumed tokens: 817889280 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.315507E+00 | grad norm: 0.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.073 | TFLOPs: 30.44 | 7: iteration 1570/ 115203 | consumed samples: 401920 | consumed tokens: 823132160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.307684E+00 | grad norm: 0.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.275 | TFLOPs: 31.18 | 7: iteration 1580/ 115203 | consumed samples: 404480 | consumed tokens: 828375040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.332930E+00 | grad norm: 0.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.524 | TFLOPs: 31.51 | 7: iteration 1590/ 115203 | consumed samples: 407040 | consumed tokens: 833617920 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.334798E+00 | grad norm: 0.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.973 | TFLOPs: 30.64 | 7: iteration 1600/ 115203 | consumed samples: 409600 | consumed tokens: 838860800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.306874E+00 | grad norm: 0.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.728 | TFLOPs: 30.99 | 7: iteration 1610/ 115203 | consumed samples: 412160 | consumed tokens: 844103680 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.282229E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.427 | TFLOPs: 29.77 | 7: iteration 1620/ 115203 | consumed samples: 414720 | consumed tokens: 849346560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.277307E+00 | grad norm: 0.573 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.443 | TFLOPs: 31.40 | 7: iteration 1630/ 115203 | consumed samples: 417280 | consumed tokens: 854589440 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.296443E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.131 | TFLOPs: 31.54 | 7: iteration 1640/ 115203 | consumed samples: 419840 | consumed tokens: 859832320 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.268481E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.306 | TFLOPs: 30.45 | 7: iteration 1650/ 115203 | consumed samples: 422400 | consumed tokens: 865075200 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.330730E+00 | grad norm: 0.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.426 | TFLOPs: 31.24 | 7: iteration 1660/ 115203 | consumed samples: 424960 | consumed tokens: 870318080 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.251434E+00 | grad norm: 0.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.818 | TFLOPs: 29.90 | 7: iteration 1670/ 115203 | consumed samples: 427520 | consumed tokens: 875560960 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.280580E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.374 | TFLOPs: 30.45 | 7: iteration 1680/ 115203 | consumed samples: 430080 | consumed tokens: 880803840 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.252330E+00 | grad norm: 0.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.987 | TFLOPs: 30.85 | 7: iteration 1690/ 115203 | consumed samples: 432640 | consumed tokens: 886046720 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.306590E+00 | grad norm: 0.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.513 | TFLOPs: 31.19 | 7: iteration 1700/ 115203 | consumed samples: 435200 | consumed tokens: 891289600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.296378E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.767 | TFLOPs: 31.05 | 7: iteration 1710/ 115203 | consumed samples: 437760 | consumed tokens: 896532480 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.255461E+00 | grad norm: 0.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.943 | TFLOPs: 30.38 | 7: iteration 1720/ 115203 | consumed samples: 440320 | consumed tokens: 901775360 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.248935E+00 | grad norm: 0.514 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.006 | TFLOPs: 31.27 | 7: iteration 1730/ 115203 | consumed samples: 442880 | consumed tokens: 907018240 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.237495E+00 | grad norm: 0.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.789 | TFLOPs: 30.79 | 7: iteration 1740/ 115203 | consumed samples: 445440 | consumed tokens: 912261120 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.278236E+00 | grad norm: 0.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.908 | TFLOPs: 30.48 | 7: iteration 1750/ 115203 | consumed samples: 448000 | consumed tokens: 917504000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.248366E+00 | grad norm: 0.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.117 | TFLOPs: 31.54 | 7: iteration 1760/ 115203 | consumed samples: 450560 | consumed tokens: 922746880 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.259323E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.139 | TFLOPs: 30.39 | 7: iteration 1770/ 115203 | consumed samples: 453120 | consumed tokens: 927989760 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.218008E+00 | grad norm: 0.503 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.245 | TFLOPs: 30.34 | 7: iteration 1780/ 115203 | consumed samples: 455680 | consumed tokens: 933232640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.202979E+00 | grad norm: 0.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.743 | TFLOPs: 31.26 | 7: iteration 1790/ 115203 | consumed samples: 458240 | consumed tokens: 938475520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.255596E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.200 | TFLOPs: 31.44 | 7: iteration 1800/ 115203 | consumed samples: 460800 | consumed tokens: 943718400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.197941E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.826 | TFLOPs: 31.10 | 7: iteration 1810/ 115203 | consumed samples: 463360 | consumed tokens: 948961280 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.230540E+00 | grad norm: 0.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.694 | TFLOPs: 30.68 | 7: iteration 1820/ 115203 | consumed samples: 465920 | consumed tokens: 954204160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.208003E+00 | grad norm: 0.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.546 | TFLOPs: 31.04 | 7: iteration 1830/ 115203 | consumed samples: 468480 | consumed tokens: 959447040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.195004E+00 | grad norm: 0.510 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.083 | TFLOPs: 31.07 | 7: iteration 1840/ 115203 | consumed samples: 471040 | consumed tokens: 964689920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.197840E+00 | grad norm: 0.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.684 | TFLOPs: 31.04 | 7: iteration 1850/ 115203 | consumed samples: 473600 | consumed tokens: 969932800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.202171E+00 | grad norm: 0.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.506 | TFLOPs: 31.40 | 7: iteration 1860/ 115203 | consumed samples: 476160 | consumed tokens: 975175680 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.155972E+00 | grad norm: 0.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.934 | TFLOPs: 31.01 | 7: iteration 1870/ 115203 | consumed samples: 478720 | consumed tokens: 980418560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.165693E+00 | grad norm: 0.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.824 | TFLOPs: 31.05 | 7: iteration 1880/ 115203 | consumed samples: 481280 | consumed tokens: 985661440 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.150864E+00 | grad norm: 0.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.510 | TFLOPs: 31.40 | 7: iteration 1890/ 115203 | consumed samples: 483840 | consumed tokens: 990904320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.200504E+00 | grad norm: 0.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.660 | TFLOPs: 31.57 | 7: iteration 1900/ 115203 | consumed samples: 486400 | consumed tokens: 996147200 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.133046E+00 | grad norm: 0.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.730 | TFLOPs: 30.57 | 7: iteration 1910/ 115203 | consumed samples: 488960 | consumed tokens: 1001390080 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.189399E+00 | grad norm: 0.532 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.438 | TFLOPs: 30.51 | 7: iteration 1920/ 115203 | consumed samples: 491520 | consumed tokens: 1006632960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.148027E+00 | grad norm: 0.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.897 | TFLOPs: 31.37 | 7: iteration 1930/ 115203 | consumed samples: 494080 | consumed tokens: 1011875840 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.159749E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.928 | TFLOPs: 30.11 | 7: iteration 1940/ 115203 | consumed samples: 496640 | consumed tokens: 1017118720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.147897E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.207 | TFLOPs: 31.81 | 7: iteration 1950/ 115203 | consumed samples: 499200 | consumed tokens: 1022361600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.128331E+00 | grad norm: 0.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.712 | TFLOPs: 30.99 | 7: iteration 1960/ 115203 | consumed samples: 501760 | consumed tokens: 1027604480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.186192E+00 | grad norm: 0.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.487 | TFLOPs: 31.19 | 7: iteration 1970/ 115203 | consumed samples: 504320 | consumed tokens: 1032847360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.184815E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.070 | TFLOPs: 31.64 | 7: iteration 1980/ 115203 | consumed samples: 506880 | consumed tokens: 1038090240 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.154357E+00 | grad norm: 0.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.929 | TFLOPs: 31.69 | 7: iteration 1990/ 115203 | consumed samples: 509440 | consumed tokens: 1043333120 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.182470E+00 | grad norm: 0.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.237 | TFLOPs: 30.97 | 0: [2022-11-28 13:09:50,824] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[0.0001999754506631688, 0.0001999754506631688, 0.0001999754506631688], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 2000/ 115203 | consumed samples: 512000 | consumed tokens: 1048576000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.170351E+00 | grad norm: 0.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.497 | TFLOPs: 31.45 | 0: steps: 2000 loss: 3.2538 iter time (s): 0.441 samples/sec: 581.124 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 2000 | lm loss value: 3.042248E+00 | lm loss PPL: 2.095230E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 2000 to checkpoints_221m 0: [2022-11-28 13:09:50,983] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step2000 is begin to save! 0: [2022-11-28 13:09:50,986] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:09:51,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:09:51,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:09:51,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:09:51,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:09:51,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:09:51,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:09:51,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:09:51,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:09:51,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:09:51,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:09:51,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:09:51,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:09:51,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:09:51,223] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:09:51,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:09:51,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:09:51,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:09:51,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:09:51,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:09:51,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:09:51,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:09:51,314] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:09:51,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:09:51,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:09:51,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:09:51,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:09:51,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:09:51,384] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:09:51,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:09:51,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:09:51,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:09:51,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:09:51,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:09:51,453] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:09:51,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:09:51,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:09:51,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:09:51,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:09:51,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:09:51,504] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step2000/mp_rank_00_model_states.pt 0: [2022-11-28 13:09:51,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:09:51,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:09:51,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:09:51,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,577] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,577] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,578] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,578] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:09:51,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 7: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:09:51,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:09:51,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 2: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 1: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:09:51,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:09:51,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,587] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 4: [2022-11-28 13:09:51,587] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:09:51,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:09:51,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 6: [2022-11-28 13:09:51,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 13:09:51,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 5: [2022-11-28 13:09:51,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 3: [2022-11-28 13:09:51,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:09:51,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:09:51,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:09:51,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: [2022-11-28 13:09:51,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step2000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:09:51,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step2000 is ready now! 0: successfully saved checkpoint at iteration 2000 to checkpoints_221m 7: time (ms) | save-checkpoint: 653.62 7: iteration 2010/ 115203 | consumed samples: 514560 | consumed tokens: 1053818880 | elapsed time per iteration (s): 0.52 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.178628E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 496.713 | TFLOPs: 26.06 | 7: iteration 2020/ 115203 | consumed samples: 517120 | consumed tokens: 1059061760 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.192919E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.766 | TFLOPs: 30.47 | 7: iteration 2030/ 115203 | consumed samples: 519680 | consumed tokens: 1064304640 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.109102E+00 | grad norm: 0.542 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.572 | TFLOPs: 30.46 | 7: iteration 2040/ 115203 | consumed samples: 522240 | consumed tokens: 1069547520 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.136283E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.191 | TFLOPs: 31.33 | 7: iteration 2050/ 115203 | consumed samples: 524800 | consumed tokens: 1074790400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.162701E+00 | grad norm: 0.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.323 | TFLOPs: 30.97 | 7: iteration 2060/ 115203 | consumed samples: 527360 | consumed tokens: 1080033280 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.130952E+00 | grad norm: 0.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.779 | TFLOPs: 29.58 | 7: iteration 2070/ 115203 | consumed samples: 529920 | consumed tokens: 1085276160 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.105819E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.641 | TFLOPs: 30.78 | 7: iteration 2080/ 115203 | consumed samples: 532480 | consumed tokens: 1090519040 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.125803E+00 | grad norm: 0.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.952 | TFLOPs: 30.38 | 7: iteration 2090/ 115203 | consumed samples: 535040 | consumed tokens: 1095761920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.118263E+00 | grad norm: 0.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.310 | TFLOPs: 31.60 | 7: iteration 2100/ 115203 | consumed samples: 537600 | consumed tokens: 1101004800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.108547E+00 | grad norm: 0.479 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.030 | TFLOPs: 31.01 | 7: iteration 2110/ 115203 | consumed samples: 540160 | consumed tokens: 1106247680 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.090353E+00 | grad norm: 0.640 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.297 | TFLOPs: 30.87 | 7: iteration 2120/ 115203 | consumed samples: 542720 | consumed tokens: 1111490560 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.109618E+00 | grad norm: 0.625 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.299 | TFLOPs: 31.29 | 7: iteration 2130/ 115203 | consumed samples: 545280 | consumed tokens: 1116733440 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.107481E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.653 | TFLOPs: 31.31 | 7: iteration 2140/ 115203 | consumed samples: 547840 | consumed tokens: 1121976320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.100292E+00 | grad norm: 0.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.613 | TFLOPs: 30.99 | 7: iteration 2150/ 115203 | consumed samples: 550400 | consumed tokens: 1127219200 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.086940E+00 | grad norm: 0.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.381 | TFLOPs: 30.71 | 7: iteration 2160/ 115203 | consumed samples: 552960 | consumed tokens: 1132462080 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.092587E+00 | grad norm: 0.556 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.536 | TFLOPs: 30.88 | 7: iteration 2170/ 115203 | consumed samples: 555520 | consumed tokens: 1137704960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.111183E+00 | grad norm: 0.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.808 | TFLOPs: 31.16 | 7: iteration 2180/ 115203 | consumed samples: 558080 | consumed tokens: 1142947840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.113822E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.276 | TFLOPs: 30.97 | 7: iteration 2190/ 115203 | consumed samples: 560640 | consumed tokens: 1148190720 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.086612E+00 | grad norm: 0.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.915 | TFLOPs: 30.85 | 7: iteration 2200/ 115203 | consumed samples: 563200 | consumed tokens: 1153433600 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.110536E+00 | grad norm: 0.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.271 | TFLOPs: 30.76 | 7: iteration 2210/ 115203 | consumed samples: 565760 | consumed tokens: 1158676480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.060939E+00 | grad norm: 0.548 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.167 | TFLOPs: 31.49 | 7: iteration 2220/ 115203 | consumed samples: 568320 | consumed tokens: 1163919360 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.129716E+00 | grad norm: 0.466 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.326 | TFLOPs: 31.87 | 7: iteration 2230/ 115203 | consumed samples: 570880 | consumed tokens: 1169162240 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.080178E+00 | grad norm: 0.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.719 | TFLOPs: 31.10 | 7: iteration 2240/ 115203 | consumed samples: 573440 | consumed tokens: 1174405120 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.106890E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.971 | TFLOPs: 30.38 | 7: iteration 2250/ 115203 | consumed samples: 576000 | consumed tokens: 1179648000 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.064002E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.534 | TFLOPs: 31.40 | 7: iteration 2260/ 115203 | consumed samples: 578560 | consumed tokens: 1184890880 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.028826E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.164 | TFLOPs: 30.13 | 7: iteration 2270/ 115203 | consumed samples: 581120 | consumed tokens: 1190133760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.029878E+00 | grad norm: 0.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.491 | TFLOPs: 31.35 | 7: iteration 2280/ 115203 | consumed samples: 583680 | consumed tokens: 1195376640 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.055219E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.273 | TFLOPs: 31.65 | 7: iteration 2290/ 115203 | consumed samples: 586240 | consumed tokens: 1200619520 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.085722E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.549 | TFLOPs: 30.57 | 7: iteration 2300/ 115203 | consumed samples: 588800 | consumed tokens: 1205862400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.059500E+00 | grad norm: 0.445 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.605 | TFLOPs: 31.20 | 7: iteration 2310/ 115203 | consumed samples: 591360 | consumed tokens: 1211105280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.083065E+00 | grad norm: 0.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.687 | TFLOPs: 31.41 | 7: iteration 2320/ 115203 | consumed samples: 593920 | consumed tokens: 1216348160 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.074753E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.651 | TFLOPs: 31.36 | 7: iteration 2330/ 115203 | consumed samples: 596480 | consumed tokens: 1221591040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 2.999009E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.426 | TFLOPs: 31.08 | 7: iteration 2340/ 115203 | consumed samples: 599040 | consumed tokens: 1226833920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.014937E+00 | grad norm: 0.485 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.978 | TFLOPs: 31.06 | 7: iteration 2350/ 115203 | consumed samples: 601600 | consumed tokens: 1232076800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.042664E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.718 | TFLOPs: 31.47 | 7: iteration 2360/ 115203 | consumed samples: 604160 | consumed tokens: 1237319680 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-04 | global batch size: 256 | lm loss: 3.061934E+00 | grad norm: 0.480 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.639 | TFLOPs: 30.94 | 7: iteration 2370/ 115203 | consumed samples: 606720 | consumed tokens: 1242562560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.072346E+00 | grad norm: 0.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.095 | TFLOPs: 31.12 | 7: iteration 2380/ 115203 | consumed samples: 609280 | consumed tokens: 1247805440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.051615E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.453 | TFLOPs: 30.98 | 7: iteration 2390/ 115203 | consumed samples: 611840 | consumed tokens: 1253048320 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.035826E+00 | grad norm: 0.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.334 | TFLOPs: 30.55 | 7: iteration 2400/ 115203 | consumed samples: 614400 | consumed tokens: 1258291200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.078647E+00 | grad norm: 0.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.306 | TFLOPs: 30.97 | 7: iteration 2410/ 115203 | consumed samples: 616960 | consumed tokens: 1263534080 | elapsed time per iteration (s): 0.47 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.042192E+00 | grad norm: 0.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.584 | TFLOPs: 28.31 | 7: iteration 2420/ 115203 | consumed samples: 619520 | consumed tokens: 1268776960 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.030399E+00 | grad norm: 0.488 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.542 | TFLOPs: 30.57 | 7: iteration 2430/ 115203 | consumed samples: 622080 | consumed tokens: 1274019840 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.036325E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.389 | TFLOPs: 30.29 | 7: iteration 2440/ 115203 | consumed samples: 624640 | consumed tokens: 1279262720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.006706E+00 | grad norm: 0.511 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.392 | TFLOPs: 31.34 | 7: iteration 2450/ 115203 | consumed samples: 627200 | consumed tokens: 1284505600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.030772E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.735 | TFLOPs: 30.99 | 7: iteration 2460/ 115203 | consumed samples: 629760 | consumed tokens: 1289748480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.054531E+00 | grad norm: 0.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.413 | TFLOPs: 31.56 | 7: iteration 2470/ 115203 | consumed samples: 632320 | consumed tokens: 1294991360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.025744E+00 | grad norm: 0.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.626 | TFLOPs: 31.41 | 7: iteration 2480/ 115203 | consumed samples: 634880 | consumed tokens: 1300234240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.003806E+00 | grad norm: 0.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.653 | TFLOPs: 31.31 | 7: iteration 2490/ 115203 | consumed samples: 637440 | consumed tokens: 1305477120 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.997334E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.076 | TFLOPs: 31.64 | 7: iteration 2500/ 115203 | consumed samples: 640000 | consumed tokens: 1310720000 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.024429E+00 | grad norm: 0.617 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.176 | TFLOPs: 31.44 | 7: iteration 2510/ 115203 | consumed samples: 642560 | consumed tokens: 1315962880 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.041109E+00 | grad norm: 0.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.787 | TFLOPs: 31.52 | 7: iteration 2520/ 115203 | consumed samples: 645120 | consumed tokens: 1321205760 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.993586E+00 | grad norm: 0.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.947 | TFLOPs: 31.74 | 7: iteration 2530/ 115203 | consumed samples: 647680 | consumed tokens: 1326448640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.004301E+00 | grad norm: 0.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.063 | TFLOPs: 31.43 | 7: iteration 2540/ 115203 | consumed samples: 650240 | consumed tokens: 1331691520 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.985884E+00 | grad norm: 0.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.090 | TFLOPs: 31.59 | 7: iteration 2550/ 115203 | consumed samples: 652800 | consumed tokens: 1336934400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.962152E+00 | grad norm: 0.463 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.095 | TFLOPs: 31.01 | 7: iteration 2560/ 115203 | consumed samples: 655360 | consumed tokens: 1342177280 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.977601E+00 | grad norm: 0.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.818 | TFLOPs: 31.21 | 7: iteration 2570/ 115203 | consumed samples: 657920 | consumed tokens: 1347420160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.992685E+00 | grad norm: 0.492 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.537 | TFLOPs: 31.25 | 7: iteration 2580/ 115203 | consumed samples: 660480 | consumed tokens: 1352663040 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.996023E+00 | grad norm: 0.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.084 | TFLOPs: 31.17 | 7: iteration 2590/ 115203 | consumed samples: 663040 | consumed tokens: 1357905920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.018552E+00 | grad norm: 0.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.756 | TFLOPs: 30.89 | 7: iteration 2600/ 115203 | consumed samples: 665600 | consumed tokens: 1363148800 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.963552E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.765 | TFLOPs: 30.63 | 7: iteration 2610/ 115203 | consumed samples: 668160 | consumed tokens: 1368391680 | elapsed time per iteration (s): 0.45 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.002444E+00 | grad norm: 0.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.749 | TFLOPs: 30.16 | 7: iteration 2620/ 115203 | consumed samples: 670720 | consumed tokens: 1373634560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.011171E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.101 | TFLOPs: 31.12 | 7: iteration 2630/ 115203 | consumed samples: 673280 | consumed tokens: 1378877440 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.983814E+00 | grad norm: 0.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.894 | TFLOPs: 31.48 | 7: iteration 2640/ 115203 | consumed samples: 675840 | consumed tokens: 1384120320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 3.033112E+00 | grad norm: 0.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.461 | TFLOPs: 31.19 | 7: iteration 2650/ 115203 | consumed samples: 678400 | consumed tokens: 1389363200 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.995167E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.288 | TFLOPs: 30.71 | 7: iteration 2660/ 115203 | consumed samples: 680960 | consumed tokens: 1394606080 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.966984E+00 | grad norm: 0.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.761 | TFLOPs: 30.21 | 7: iteration 2670/ 115203 | consumed samples: 683520 | consumed tokens: 1399848960 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.976024E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.431 | TFLOPs: 31.19 | 7: iteration 2680/ 115203 | consumed samples: 686080 | consumed tokens: 1405091840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.939242E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.382 | TFLOPs: 31.40 | 7: iteration 2690/ 115203 | consumed samples: 688640 | consumed tokens: 1410334720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.965035E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.011 | TFLOPs: 31.32 | 7: iteration 2700/ 115203 | consumed samples: 691200 | consumed tokens: 1415577600 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.976989E+00 | grad norm: 0.456 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.933 | TFLOPs: 31.63 | 7: iteration 2710/ 115203 | consumed samples: 693760 | consumed tokens: 1420820480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.935778E+00 | grad norm: 0.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.554 | TFLOPs: 30.99 | 7: iteration 2720/ 115203 | consumed samples: 696320 | consumed tokens: 1426063360 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.995092E+00 | grad norm: 0.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.696 | TFLOPs: 31.52 | 7: iteration 2730/ 115203 | consumed samples: 698880 | consumed tokens: 1431306240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.952458E+00 | grad norm: 0.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.214 | TFLOPs: 31.18 | 7: iteration 2740/ 115203 | consumed samples: 701440 | consumed tokens: 1436549120 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.925352E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.390 | TFLOPs: 30.29 | 7: iteration 2750/ 115203 | consumed samples: 704000 | consumed tokens: 1441792000 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.937645E+00 | grad norm: 0.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.549 | TFLOPs: 30.25 | 7: iteration 2760/ 115203 | consumed samples: 706560 | consumed tokens: 1447034880 | elapsed time per iteration (s): 0.46 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.951062E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.972 | TFLOPs: 29.43 | 7: iteration 2770/ 115203 | consumed samples: 709120 | consumed tokens: 1452277760 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.974105E+00 | grad norm: 0.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.597 | TFLOPs: 31.72 | 7: iteration 2780/ 115203 | consumed samples: 711680 | consumed tokens: 1457520640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.960794E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.619 | TFLOPs: 31.51 | 7: iteration 2790/ 115203 | consumed samples: 714240 | consumed tokens: 1462763520 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.974939E+00 | grad norm: 0.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.018 | TFLOPs: 31.27 | 7: iteration 2800/ 115203 | consumed samples: 716800 | consumed tokens: 1468006400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.996698E+00 | grad norm: 0.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.091 | TFLOPs: 31.12 | 7: iteration 2810/ 115203 | consumed samples: 719360 | consumed tokens: 1473249280 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.913669E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.614 | TFLOPs: 31.83 | 7: iteration 2820/ 115203 | consumed samples: 721920 | consumed tokens: 1478492160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.989477E+00 | grad norm: 0.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.172 | TFLOPs: 31.28 | 7: iteration 2830/ 115203 | consumed samples: 724480 | consumed tokens: 1483735040 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.980926E+00 | grad norm: 0.483 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.757 | TFLOPs: 31.26 | 7: iteration 2840/ 115203 | consumed samples: 727040 | consumed tokens: 1488977920 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.937365E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.153 | TFLOPs: 31.07 | 7: iteration 2850/ 115203 | consumed samples: 729600 | consumed tokens: 1494220800 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.956851E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.639 | TFLOPs: 31.36 | 7: iteration 2860/ 115203 | consumed samples: 732160 | consumed tokens: 1499463680 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.953904E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.966 | TFLOPs: 31.37 | 7: iteration 2870/ 115203 | consumed samples: 734720 | consumed tokens: 1504706560 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.961889E+00 | grad norm: 0.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.560 | TFLOPs: 31.30 | 7: iteration 2880/ 115203 | consumed samples: 737280 | consumed tokens: 1509949440 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.949150E+00 | grad norm: 0.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.905 | TFLOPs: 30.74 | 7: iteration 2890/ 115203 | consumed samples: 739840 | consumed tokens: 1515192320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.931499E+00 | grad norm: 0.482 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.270 | TFLOPs: 31.29 | 7: iteration 2900/ 115203 | consumed samples: 742400 | consumed tokens: 1520435200 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.933144E+00 | grad norm: 0.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.102 | TFLOPs: 31.91 | 7: iteration 2910/ 115203 | consumed samples: 744960 | consumed tokens: 1525678080 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.939827E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.743 | TFLOPs: 30.31 | 7: iteration 2920/ 115203 | consumed samples: 747520 | consumed tokens: 1530920960 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.899814E+00 | grad norm: 0.427 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.173 | TFLOPs: 30.86 | 7: iteration 2930/ 115203 | consumed samples: 750080 | consumed tokens: 1536163840 | elapsed time per iteration (s): 0.44 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.895403E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.134 | TFLOPs: 30.86 | 7: iteration 2940/ 115203 | consumed samples: 752640 | consumed tokens: 1541406720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.905085E+00 | grad norm: 0.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.267 | TFLOPs: 31.60 | 7: iteration 2950/ 115203 | consumed samples: 755200 | consumed tokens: 1546649600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.900476E+00 | grad norm: 0.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.200 | TFLOPs: 31.28 | 7: iteration 2960/ 115203 | consumed samples: 757760 | consumed tokens: 1551892480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.954304E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.584 | TFLOPs: 31.46 | 7: iteration 2970/ 115203 | consumed samples: 760320 | consumed tokens: 1557135360 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.961420E+00 | grad norm: 0.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.763 | TFLOPs: 31.78 | 7: iteration 2980/ 115203 | consumed samples: 762880 | consumed tokens: 1562378240 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.936593E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.812 | TFLOPs: 31.63 | 7: iteration 2990/ 115203 | consumed samples: 765440 | consumed tokens: 1567621120 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.923351E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.460 | TFLOPs: 31.72 | 7: iteration 3000/ 115203 | consumed samples: 768000 | consumed tokens: 1572864000 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.925618E+00 | grad norm: 0.502 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.317 | TFLOPs: 31.13 | 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 3000 | lm loss value: 2.860874E+00 | lm loss PPL: 1.747680E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 3000 to checkpoints_221m 0: [2022-11-28 13:17:04,327] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step3000 is begin to save! 0: [2022-11-28 13:17:04,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:17:04,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:17:04,475] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:17:04,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:17:04,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:17:04,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:17:04,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:17:04,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:17:04,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:17:04,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:17:04,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:17:04,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:17:04,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:17:04,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:17:04,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:17:04,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:17:04,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:17:04,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:17:04,665] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:17:04,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:17:04,688] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:17:04,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:17:04,711] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:17:04,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:17:04,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:17:04,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:17:04,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:17:04,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:17:04,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:17:04,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:17:04,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:17:04,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:17:04,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:17:04,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:17:04,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:17:04,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:17:04,870] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:17:04,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:17:04,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:17:04,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:17:04,897] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step3000/mp_rank_00_model_states.pt 0: [2022-11-28 13:17:04,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:17:04,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:17:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 4: [2022-11-28 13:17:04,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:17:04,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:17:04,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:17:04,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 2: [2022-11-28 13:17:04,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 5: [2022-11-28 13:17:04,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:17:04,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 13:17:04,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 3: [2022-11-28 13:17:04,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:17:04,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:17:04,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:17:04,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 1: [2022-11-28 13:17:04,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 13:17:04,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 7: [2022-11-28 13:17:04,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:17:04,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:17:04,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:17:04,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,992] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:04,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:04,992] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:17:04,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:17:04,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 6: [2022-11-28 13:17:04,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: [2022-11-28 13:17:05,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step3000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:17:05,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step3000 is ready now! 0: successfully saved checkpoint at iteration 3000 to checkpoints_221m 7: time (ms) | save-checkpoint: 704.30 7: iteration 3010/ 115203 | consumed samples: 770560 | consumed tokens: 1578106880 | elapsed time per iteration (s): 0.51 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.905489E+00 | grad norm: 0.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 498.751 | TFLOPs: 26.17 | 7: iteration 3020/ 115203 | consumed samples: 773120 | consumed tokens: 1583349760 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.923440E+00 | grad norm: 0.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.228 | TFLOPs: 31.13 | 7: iteration 3030/ 115203 | consumed samples: 775680 | consumed tokens: 1588592640 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.877674E+00 | grad norm: 0.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.669 | TFLOPs: 31.25 | 7: iteration 3040/ 115203 | consumed samples: 778240 | consumed tokens: 1593835520 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.918332E+00 | grad norm: 0.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.968 | TFLOPs: 31.11 | 7: iteration 3050/ 115203 | consumed samples: 780800 | consumed tokens: 1599078400 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.928829E+00 | grad norm: 0.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.559 | TFLOPs: 30.88 | 7: iteration 3060/ 115203 | consumed samples: 783360 | consumed tokens: 1604321280 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.942941E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.719 | TFLOPs: 31.26 | 7: iteration 3070/ 115203 | consumed samples: 785920 | consumed tokens: 1609564160 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.945310E+00 | grad norm: 0.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.718 | TFLOPs: 31.20 | 7: iteration 3080/ 115203 | consumed samples: 788480 | consumed tokens: 1614807040 | elapsed time per iteration (s): 0.46 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.910569E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.133 | TFLOPs: 29.02 | 7: iteration 3090/ 115203 | consumed samples: 791040 | consumed tokens: 1620049920 | elapsed time per iteration (s): 0.45 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.908571E+00 | grad norm: 0.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.150 | TFLOPs: 30.18 | 7: iteration 3100/ 115203 | consumed samples: 793600 | consumed tokens: 1625292800 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.942826E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.596 | TFLOPs: 30.99 | 7: iteration 3110/ 115203 | consumed samples: 796160 | consumed tokens: 1630535680 | elapsed time per iteration (s): 0.45 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.875755E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.835 | TFLOPs: 29.74 | 7: iteration 3120/ 115203 | consumed samples: 798720 | consumed tokens: 1635778560 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.917117E+00 | grad norm: 0.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.175 | TFLOPs: 31.65 | 7: iteration 3130/ 115203 | consumed samples: 801280 | consumed tokens: 1641021440 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.911350E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.233 | TFLOPs: 32.07 | 7: iteration 3140/ 115203 | consumed samples: 803840 | consumed tokens: 1646264320 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.878060E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.255 | TFLOPs: 31.07 | 7: iteration 3150/ 115203 | consumed samples: 806400 | consumed tokens: 1651507200 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.921364E+00 | grad norm: 0.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.102 | TFLOPs: 31.38 | 7: iteration 3160/ 115203 | consumed samples: 808960 | consumed tokens: 1656750080 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.871679E+00 | grad norm: 0.387 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.922 | TFLOPs: 31.42 | 7: iteration 3170/ 115203 | consumed samples: 811520 | consumed tokens: 1661992960 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.886481E+00 | grad norm: 0.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.350 | TFLOPs: 31.39 | 7: iteration 3180/ 115203 | consumed samples: 814080 | consumed tokens: 1667235840 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.906892E+00 | grad norm: 0.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.562 | TFLOPs: 30.88 | 7: iteration 3190/ 115203 | consumed samples: 816640 | consumed tokens: 1672478720 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.860658E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.832 | TFLOPs: 31.10 | 7: iteration 3200/ 115203 | consumed samples: 819200 | consumed tokens: 1677721600 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.866529E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.210 | TFLOPs: 31.54 | 7: iteration 3210/ 115203 | consumed samples: 821760 | consumed tokens: 1682964480 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.916706E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.450 | TFLOPs: 31.14 | 7: iteration 3220/ 115203 | consumed samples: 824320 | consumed tokens: 1688207360 | elapsed time per iteration (s): 0.42 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.877430E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.762 | TFLOPs: 31.63 | 7: iteration 3230/ 115203 | consumed samples: 826880 | consumed tokens: 1693450240 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.902804E+00 | grad norm: 0.429 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.227 | TFLOPs: 31.07 | 7: iteration 3240/ 115203 | consumed samples: 829440 | consumed tokens: 1698693120 | elapsed time per iteration (s): 0.43 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.873359E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.567 | TFLOPs: 31.30 | 7: iteration 3250/ 115203 | consumed samples: 832000 | consumed tokens: 1703936000 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.850133E+00 | grad norm: 0.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.413 | TFLOPs: 31.50 | 7: iteration 3260/ 115203 | consumed samples: 834560 | consumed tokens: 1709178880 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.884170E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.479 | TFLOPs: 31.72 | 7: iteration 3270/ 115203 | consumed samples: 837120 | consumed tokens: 1714421760 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.890481E+00 | grad norm: 0.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.417 | TFLOPs: 30.72 | 7: iteration 3280/ 115203 | consumed samples: 839680 | consumed tokens: 1719664640 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.875777E+00 | grad norm: 0.417 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.038 | TFLOPs: 31.38 | 7: iteration 3290/ 115203 | consumed samples: 842240 | consumed tokens: 1724907520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.845601E+00 | grad norm: 0.430 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.318 | TFLOPs: 31.60 | 7: iteration 3300/ 115203 | consumed samples: 844800 | consumed tokens: 1730150400 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.897741E+00 | grad norm: 0.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.993 | TFLOPs: 30.43 | 7: iteration 3310/ 115203 | consumed samples: 847360 | consumed tokens: 1735393280 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.878257E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.436 | TFLOPs: 31.82 | 7: iteration 3320/ 115203 | consumed samples: 849920 | consumed tokens: 1740636160 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.893151E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.754 | TFLOPs: 31.05 | 7: iteration 3330/ 115203 | consumed samples: 852480 | consumed tokens: 1745879040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.905932E+00 | grad norm: 0.389 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.476 | TFLOPs: 31.14 | 7: iteration 3340/ 115203 | consumed samples: 855040 | consumed tokens: 1751121920 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.899060E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.730 | TFLOPs: 31.83 | 7: iteration 3350/ 115203 | consumed samples: 857600 | consumed tokens: 1756364800 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.886217E+00 | grad norm: 0.420 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.263 | TFLOPs: 31.08 | 7: iteration 3360/ 115203 | consumed samples: 860160 | consumed tokens: 1761607680 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.875811E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.527 | TFLOPs: 31.56 | 7: iteration 3370/ 115203 | consumed samples: 862720 | consumed tokens: 1766850560 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.898431E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.539 | TFLOPs: 31.30 | 7: iteration 3380/ 115203 | consumed samples: 865280 | consumed tokens: 1772093440 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.879206E+00 | grad norm: 0.421 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.858 | TFLOPs: 31.11 | 7: iteration 3390/ 115203 | consumed samples: 867840 | consumed tokens: 1777336320 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.873798E+00 | grad norm: 0.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.958 | TFLOPs: 30.74 | 7: iteration 3400/ 115203 | consumed samples: 870400 | consumed tokens: 1782579200 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.845344E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.310 | TFLOPs: 31.65 | 7: iteration 3410/ 115203 | consumed samples: 872960 | consumed tokens: 1787822080 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.908882E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.150 | TFLOPs: 31.96 | 7: iteration 3420/ 115203 | consumed samples: 875520 | consumed tokens: 1793064960 | elapsed time per iteration (s): 0.45 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.894098E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.931 | TFLOPs: 30.06 | 7: iteration 3430/ 115203 | consumed samples: 878080 | consumed tokens: 1798307840 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.846842E+00 | grad norm: 0.404 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.278 | TFLOPs: 31.65 | 7: iteration 3440/ 115203 | consumed samples: 880640 | consumed tokens: 1803550720 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.820526E+00 | grad norm: 0.366 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.405 | TFLOPs: 31.66 | 7: iteration 3450/ 115203 | consumed samples: 883200 | consumed tokens: 1808793600 | elapsed time per iteration (s): 0.46 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.886892E+00 | grad norm: 0.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.444 | TFLOPs: 29.41 | 7: iteration 3460/ 115203 | consumed samples: 885760 | consumed tokens: 1814036480 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.855954E+00 | grad norm: 0.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.824 | TFLOPs: 31.00 | 7: iteration 3470/ 115203 | consumed samples: 888320 | consumed tokens: 1819279360 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.885485E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.080 | TFLOPs: 31.96 | 7: iteration 3480/ 115203 | consumed samples: 890880 | consumed tokens: 1824522240 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.883920E+00 | grad norm: 0.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.904 | TFLOPs: 31.90 | 7: iteration 3490/ 115203 | consumed samples: 893440 | consumed tokens: 1829765120 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.855298E+00 | grad norm: 0.412 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.535 | TFLOPs: 31.77 | 7: iteration 3500/ 115203 | consumed samples: 896000 | consumed tokens: 1835008000 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.849662E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.064 | TFLOPs: 31.85 | 7: iteration 3510/ 115203 | consumed samples: 898560 | consumed tokens: 1840250880 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.864235E+00 | grad norm: 0.481 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.667 | TFLOPs: 31.10 | 7: iteration 3520/ 115203 | consumed samples: 901120 | consumed tokens: 1845493760 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.838497E+00 | grad norm: 0.362 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.996 | TFLOPs: 31.11 | 7: iteration 3530/ 115203 | consumed samples: 903680 | consumed tokens: 1850736640 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.852986E+00 | grad norm: 0.448 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.227 | TFLOPs: 31.28 | 7: iteration 3540/ 115203 | consumed samples: 906240 | consumed tokens: 1855979520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.886615E+00 | grad norm: 0.515 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.444 | TFLOPs: 31.08 | 7: iteration 3550/ 115203 | consumed samples: 908800 | consumed tokens: 1861222400 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.884492E+00 | grad norm: 0.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.367 | TFLOPs: 31.13 | 7: iteration 3560/ 115203 | consumed samples: 911360 | consumed tokens: 1866465280 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.897045E+00 | grad norm: 0.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.342 | TFLOPs: 31.34 | 7: iteration 3570/ 115203 | consumed samples: 913920 | consumed tokens: 1871708160 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.868634E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.306 | TFLOPs: 31.44 | 7: iteration 3580/ 115203 | consumed samples: 916480 | consumed tokens: 1876951040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.876685E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.421 | TFLOPs: 31.14 | 7: iteration 3590/ 115203 | consumed samples: 919040 | consumed tokens: 1882193920 | elapsed time per iteration (s): 0.45 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.829526E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.234 | TFLOPs: 29.60 | 7: iteration 3600/ 115203 | consumed samples: 921600 | consumed tokens: 1887436800 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.844676E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.836 | TFLOPs: 30.84 | 7: iteration 3610/ 115203 | consumed samples: 924160 | consumed tokens: 1892679680 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.867494E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.295 | TFLOPs: 30.76 | 7: iteration 3620/ 115203 | consumed samples: 926720 | consumed tokens: 1897922560 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.830484E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.550 | TFLOPs: 31.46 | 7: iteration 3630/ 115203 | consumed samples: 929280 | consumed tokens: 1903165440 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.860843E+00 | grad norm: 0.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.005 | TFLOPs: 30.27 | 7: iteration 3640/ 115203 | consumed samples: 931840 | consumed tokens: 1908408320 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.871678E+00 | grad norm: 0.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.993 | TFLOPs: 31.06 | 7: iteration 3650/ 115203 | consumed samples: 934400 | consumed tokens: 1913651200 | elapsed time per iteration (s): 0.44 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.830093E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.855 | TFLOPs: 30.74 | 7: iteration 3660/ 115203 | consumed samples: 936960 | consumed tokens: 1918894080 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.832429E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.055 | TFLOPs: 31.48 | 7: iteration 3670/ 115203 | consumed samples: 939520 | consumed tokens: 1924136960 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.828053E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.813 | TFLOPs: 31.21 | 7: iteration 3680/ 115203 | consumed samples: 942080 | consumed tokens: 1929379840 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.863290E+00 | grad norm: 0.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.489 | TFLOPs: 31.56 | 7: iteration 3690/ 115203 | consumed samples: 944640 | consumed tokens: 1934622720 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.858867E+00 | grad norm: 0.367 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.972 | TFLOPs: 31.53 | 7: iteration 3700/ 115203 | consumed samples: 947200 | consumed tokens: 1939865600 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.833883E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.193 | TFLOPs: 31.65 | 7: iteration 3710/ 115203 | consumed samples: 949760 | consumed tokens: 1945108480 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.842379E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.826 | TFLOPs: 31.42 | 7: iteration 3720/ 115203 | consumed samples: 952320 | consumed tokens: 1950351360 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.852381E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.105 | TFLOPs: 31.12 | 7: iteration 3730/ 115203 | consumed samples: 954880 | consumed tokens: 1955594240 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.843175E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.779 | TFLOPs: 31.10 | 7: iteration 3740/ 115203 | consumed samples: 957440 | consumed tokens: 1960837120 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.823276E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.483 | TFLOPs: 31.30 | 7: iteration 3750/ 115203 | consumed samples: 960000 | consumed tokens: 1966080000 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.858090E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.992 | TFLOPs: 31.69 | 7: iteration 3760/ 115203 | consumed samples: 962560 | consumed tokens: 1971322880 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.805270E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.929 | TFLOPs: 31.37 | 7: iteration 3770/ 115203 | consumed samples: 965120 | consumed tokens: 1976565760 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.832981E+00 | grad norm: 0.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.965 | TFLOPs: 31.64 | 7: iteration 3780/ 115203 | consumed samples: 967680 | consumed tokens: 1981808640 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.829378E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.216 | TFLOPs: 31.39 | 7: iteration 3790/ 115203 | consumed samples: 970240 | consumed tokens: 1987051520 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.840227E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.081 | TFLOPs: 30.91 | 7: iteration 3800/ 115203 | consumed samples: 972800 | consumed tokens: 1992294400 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.841056E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.265 | TFLOPs: 31.81 | 7: iteration 3810/ 115203 | consumed samples: 975360 | consumed tokens: 1997537280 | elapsed time per iteration (s): 0.45 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.829333E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.893 | TFLOPs: 29.95 | 7: iteration 3820/ 115203 | consumed samples: 977920 | consumed tokens: 2002780160 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.843335E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.513 | TFLOPs: 31.40 | 7: iteration 3830/ 115203 | consumed samples: 980480 | consumed tokens: 2008023040 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.858100E+00 | grad norm: 0.347 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.428 | TFLOPs: 31.29 | 7: iteration 3840/ 115203 | consumed samples: 983040 | consumed tokens: 2013265920 | elapsed time per iteration (s): 0.42 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.826443E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.339 | TFLOPs: 31.92 | 7: iteration 3850/ 115203 | consumed samples: 985600 | consumed tokens: 2018508800 | elapsed time per iteration (s): 0.43 | learning rate: 1.998E-04 | global batch size: 256 | lm loss: 2.798153E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.266 | TFLOPs: 31.34 | 7: iteration 3860/ 115203 | consumed samples: 988160 | consumed tokens: 2023751680 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.817743E+00 | grad norm: 0.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.330 | TFLOPs: 31.29 | 7: iteration 3870/ 115203 | consumed samples: 990720 | consumed tokens: 2028994560 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.786418E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | 7: iteration 3880/ 115203 | consumed samples: 993280 | consumed tokens: 2034237440 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.838961E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.329 | TFLOPs: 30.82 | 7: iteration 3890/ 115203 | consumed samples: 995840 | consumed tokens: 2039480320 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.840067E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.760 | TFLOPs: 31.57 | 7: iteration 3900/ 115203 | consumed samples: 998400 | consumed tokens: 2044723200 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.801536E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.927 | TFLOPs: 30.38 | 7: iteration 3910/ 115203 | consumed samples: 1000960 | consumed tokens: 2049966080 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.821055E+00 | grad norm: 0.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.711 | TFLOPs: 30.47 | 7: iteration 3920/ 115203 | consumed samples: 1003520 | consumed tokens: 2055208960 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.797901E+00 | grad norm: 0.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.178 | TFLOPs: 31.44 | 7: iteration 3930/ 115203 | consumed samples: 1006080 | consumed tokens: 2060451840 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.845805E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.089 | TFLOPs: 31.43 | 7: iteration 3940/ 115203 | consumed samples: 1008640 | consumed tokens: 2065694720 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.781219E+00 | grad norm: 0.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.591 | TFLOPs: 30.83 | 7: iteration 3950/ 115203 | consumed samples: 1011200 | consumed tokens: 2070937600 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.808230E+00 | grad norm: 0.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.067 | TFLOPs: 31.43 | 7: iteration 3960/ 115203 | consumed samples: 1013760 | consumed tokens: 2076180480 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.779804E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.193 | TFLOPs: 31.60 | 7: iteration 3970/ 115203 | consumed samples: 1016320 | consumed tokens: 2081423360 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.841066E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.981 | TFLOPs: 31.43 | 7: iteration 3980/ 115203 | consumed samples: 1018880 | consumed tokens: 2086666240 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.816817E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.506 | TFLOPs: 31.35 | 7: iteration 3990/ 115203 | consumed samples: 1021440 | consumed tokens: 2091909120 | elapsed time per iteration (s): 0.46 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.832246E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.270 | TFLOPs: 29.03 | 0: [2022-11-28 13:24:16,065] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=0, lr=[0.00019972320825211248, 0.00019972320825211248, 0.00019972320825211248], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 4000/ 115203 | consumed samples: 1024000 | consumed tokens: 2097152000 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.789504E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.971 | TFLOPs: 31.58 | 0: steps: 4000 loss: 2.7425 iter time (s): 0.429 samples/sec: 596.363 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 4000 | lm loss value: 2.863212E+00 | lm loss PPL: 1.751770E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 4000 to checkpoints_221m 0: [2022-11-28 13:24:16,224] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step4000 is begin to save! 0: [2022-11-28 13:24:16,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:24:16,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:24:16,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:24:16,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:24:16,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:24:16,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:24:16,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:24:16,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:24:16,395] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:24:16,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:24:16,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:24:16,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:24:16,441] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:24:16,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:24:16,464] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:24:16,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:24:16,487] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:24:16,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:24:16,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:24:16,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:24:16,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:24:16,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:24:16,557] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:24:16,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:24:16,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:24:16,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:24:16,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:24:16,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:24:16,627] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:24:16,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:24:16,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:24:16,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:24:16,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:24:16,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:24:16,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:24:16,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:24:16,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:24:16,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:24:16,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:24:16,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:24:16,749] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step4000/mp_rank_00_model_states.pt 0: [2022-11-28 13:24:16,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:24:16,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:24:16,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:24:16,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,820] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,820] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,821] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,821] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,822] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,822] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 2: [2022-11-28 13:24:16,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 7: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:24:16,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:24:16,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:24:16,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 1: [2022-11-28 13:24:16,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:24:16,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:24:16,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 4: [2022-11-28 13:24:16,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:24:16,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,838] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 6: [2022-11-28 13:24:16,838] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,835] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 5: [2022-11-28 13:24:16,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,835] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:24:16,836] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 13:24:16,836] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:24:16,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: [2022-11-28 13:24:16,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step4000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:24:16,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step4000 is ready now! 0: successfully saved checkpoint at iteration 4000 to checkpoints_221m 7: time (ms) | save-checkpoint: 659.93 7: iteration 4010/ 115203 | consumed samples: 1026560 | consumed tokens: 2102394880 | elapsed time per iteration (s): 0.51 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.801737E+00 | grad norm: 0.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.111 | TFLOPs: 26.19 | 7: iteration 4020/ 115203 | consumed samples: 1029120 | consumed tokens: 2107637760 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.832436E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.320 | TFLOPs: 30.40 | 7: iteration 4030/ 115203 | consumed samples: 1031680 | consumed tokens: 2112880640 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.799988E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.764 | TFLOPs: 31.94 | 7: iteration 4040/ 115203 | consumed samples: 1034240 | consumed tokens: 2118123520 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.771996E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.412 | TFLOPs: 31.35 | 7: iteration 4050/ 115203 | consumed samples: 1036800 | consumed tokens: 2123366400 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.781097E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.268 | TFLOPs: 30.97 | 7: iteration 4060/ 115203 | consumed samples: 1039360 | consumed tokens: 2128609280 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.822338E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.059 | TFLOPs: 31.48 | 7: iteration 4070/ 115203 | consumed samples: 1041920 | consumed tokens: 2133852160 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.830044E+00 | grad norm: 0.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.890 | TFLOPs: 31.27 | 7: iteration 4080/ 115203 | consumed samples: 1044480 | consumed tokens: 2139095040 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.800573E+00 | grad norm: 0.352 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.933 | TFLOPs: 30.80 | 7: iteration 4090/ 115203 | consumed samples: 1047040 | consumed tokens: 2144337920 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.807100E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.489 | TFLOPs: 31.61 | 7: iteration 4100/ 115203 | consumed samples: 1049600 | consumed tokens: 2149580800 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.799848E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.242 | TFLOPs: 31.60 | 7: iteration 4110/ 115203 | consumed samples: 1052160 | consumed tokens: 2154823680 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.763374E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.396 | TFLOPs: 31.03 | 7: iteration 4120/ 115203 | consumed samples: 1054720 | consumed tokens: 2160066560 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.757761E+00 | grad norm: 0.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.141 | TFLOPs: 31.12 | 7: iteration 4130/ 115203 | consumed samples: 1057280 | consumed tokens: 2165309440 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.763015E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.241 | TFLOPs: 30.97 | 7: iteration 4140/ 115203 | consumed samples: 1059840 | consumed tokens: 2170552320 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.781772E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.660 | TFLOPs: 31.20 | 7: iteration 4150/ 115203 | consumed samples: 1062400 | consumed tokens: 2175795200 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.796418E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.008 | TFLOPs: 31.59 | 7: iteration 4160/ 115203 | consumed samples: 1064960 | consumed tokens: 2181038080 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.795251E+00 | grad norm: 0.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.211 | TFLOPs: 31.33 | 7: iteration 4170/ 115203 | consumed samples: 1067520 | consumed tokens: 2186280960 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.769121E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.306 | TFLOPs: 30.97 | 7: iteration 4180/ 115203 | consumed samples: 1070080 | consumed tokens: 2191523840 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.787965E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.078 | TFLOPs: 31.64 | 7: iteration 4190/ 115203 | consumed samples: 1072640 | consumed tokens: 2196766720 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.755243E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.309 | TFLOPs: 31.97 | 7: iteration 4200/ 115203 | consumed samples: 1075200 | consumed tokens: 2202009600 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.790174E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.953 | TFLOPs: 32.11 | 7: iteration 4210/ 115203 | consumed samples: 1077760 | consumed tokens: 2207252480 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.787665E+00 | grad norm: 0.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.689 | TFLOPs: 31.41 | 7: iteration 4220/ 115203 | consumed samples: 1080320 | consumed tokens: 2212495360 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.807791E+00 | grad norm: 0.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.307 | TFLOPs: 31.23 | 7: iteration 4230/ 115203 | consumed samples: 1082880 | consumed tokens: 2217738240 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.786514E+00 | grad norm: 0.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.847 | TFLOPs: 31.63 | 7: iteration 4240/ 115203 | consumed samples: 1085440 | consumed tokens: 2222981120 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.785241E+00 | grad norm: 0.348 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.653 | TFLOPs: 31.73 | 7: iteration 4250/ 115203 | consumed samples: 1088000 | consumed tokens: 2228224000 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.764282E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.028 | TFLOPs: 30.43 | 7: iteration 4260/ 115203 | consumed samples: 1090560 | consumed tokens: 2233466880 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.791033E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.013 | TFLOPs: 31.90 | 7: iteration 4270/ 115203 | consumed samples: 1093120 | consumed tokens: 2238709760 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.748320E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.549 | TFLOPs: 31.14 | 7: iteration 4280/ 115203 | consumed samples: 1095680 | consumed tokens: 2243952640 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.786442E+00 | grad norm: 0.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.570 | TFLOPs: 30.99 | 7: iteration 4290/ 115203 | consumed samples: 1098240 | consumed tokens: 2249195520 | elapsed time per iteration (s): 0.42 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.769810E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.122 | TFLOPs: 31.80 | 7: iteration 4300/ 115203 | consumed samples: 1100800 | consumed tokens: 2254438400 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.784660E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.275 | TFLOPs: 31.13 | 7: iteration 4310/ 115203 | consumed samples: 1103360 | consumed tokens: 2259681280 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.776217E+00 | grad norm: 0.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.647 | TFLOPs: 30.89 | 7: iteration 4320/ 115203 | consumed samples: 1105920 | consumed tokens: 2264924160 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.774000E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.820 | TFLOPs: 31.00 | 7: iteration 4330/ 115203 | consumed samples: 1108480 | consumed tokens: 2270167040 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.747018E+00 | grad norm: 0.386 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.566 | TFLOPs: 31.30 | 7: iteration 4340/ 115203 | consumed samples: 1111040 | consumed tokens: 2275409920 | elapsed time per iteration (s): 0.44 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.809897E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.243 | TFLOPs: 30.65 | 7: iteration 4350/ 115203 | consumed samples: 1113600 | consumed tokens: 2280652800 | elapsed time per iteration (s): 0.43 | learning rate: 1.997E-04 | global batch size: 256 | lm loss: 2.780736E+00 | grad norm: 0.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.605 | TFLOPs: 31.25 | 7: iteration 4360/ 115203 | consumed samples: 1116160 | consumed tokens: 2285895680 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.772656E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.651 | TFLOPs: 31.20 | 7: iteration 4370/ 115203 | consumed samples: 1118720 | consumed tokens: 2291138560 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.806369E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.667 | TFLOPs: 31.46 | 7: iteration 4380/ 115203 | consumed samples: 1121280 | consumed tokens: 2296381440 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.768608E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | 7: iteration 4390/ 115203 | consumed samples: 1123840 | consumed tokens: 2301624320 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.800318E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.079 | TFLOPs: 31.54 | 7: iteration 4400/ 115203 | consumed samples: 1126400 | consumed tokens: 2306867200 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.779862E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.738 | TFLOPs: 31.57 | 7: iteration 4410/ 115203 | consumed samples: 1128960 | consumed tokens: 2312110080 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.753806E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.248 | TFLOPs: 31.02 | 7: iteration 4420/ 115203 | consumed samples: 1131520 | consumed tokens: 2317352960 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.766032E+00 | grad norm: 0.341 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.524 | TFLOPs: 32.03 | 7: iteration 4430/ 115203 | consumed samples: 1134080 | consumed tokens: 2322595840 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.740452E+00 | grad norm: 0.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.158 | TFLOPs: 31.70 | 7: iteration 4440/ 115203 | consumed samples: 1136640 | consumed tokens: 2327838720 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.778880E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.850 | TFLOPs: 31.47 | 7: iteration 4450/ 115203 | consumed samples: 1139200 | consumed tokens: 2333081600 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.780821E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.656 | TFLOPs: 31.41 | 7: iteration 4460/ 115203 | consumed samples: 1141760 | consumed tokens: 2338324480 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.712804E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.378 | TFLOPs: 31.40 | 7: iteration 4470/ 115203 | consumed samples: 1144320 | consumed tokens: 2343567360 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.755960E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.683 | TFLOPs: 31.62 | 7: iteration 4480/ 115203 | consumed samples: 1146880 | consumed tokens: 2348810240 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.776805E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.414 | TFLOPs: 31.82 | 7: iteration 4490/ 115203 | consumed samples: 1149440 | consumed tokens: 2354053120 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.728566E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.244 | TFLOPs: 31.39 | 7: iteration 4500/ 115203 | consumed samples: 1152000 | consumed tokens: 2359296000 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.763918E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.167 | TFLOPs: 31.70 | 7: iteration 4510/ 115203 | consumed samples: 1154560 | consumed tokens: 2364538880 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.741960E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.619 | TFLOPs: 30.88 | 7: iteration 4520/ 115203 | consumed samples: 1157120 | consumed tokens: 2369781760 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.765877E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.735 | TFLOPs: 31.36 | 7: iteration 4530/ 115203 | consumed samples: 1159680 | consumed tokens: 2375024640 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.789571E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.081 | TFLOPs: 32.06 | 7: iteration 4540/ 115203 | consumed samples: 1162240 | consumed tokens: 2380267520 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.769872E+00 | grad norm: 0.363 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.841 | TFLOPs: 31.37 | 7: iteration 4550/ 115203 | consumed samples: 1164800 | consumed tokens: 2385510400 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.743062E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.952 | TFLOPs: 31.48 | 7: iteration 4560/ 115203 | consumed samples: 1167360 | consumed tokens: 2390753280 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.759711E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.426 | TFLOPs: 31.66 | 7: iteration 4570/ 115203 | consumed samples: 1169920 | consumed tokens: 2395996160 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.744142E+00 | grad norm: 0.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.005 | TFLOPs: 30.27 | 7: iteration 4580/ 115203 | consumed samples: 1172480 | consumed tokens: 2401239040 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.765374E+00 | grad norm: 0.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.057 | TFLOPs: 31.48 | 7: iteration 4590/ 115203 | consumed samples: 1175040 | consumed tokens: 2406481920 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.716467E+00 | grad norm: 0.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.955 | TFLOPs: 31.79 | 7: iteration 4600/ 115203 | consumed samples: 1177600 | consumed tokens: 2411724800 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.782841E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.248 | TFLOPs: 31.28 | 7: iteration 4610/ 115203 | consumed samples: 1180160 | consumed tokens: 2416967680 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.752913E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.705 | TFLOPs: 30.73 | 7: iteration 4620/ 115203 | consumed samples: 1182720 | consumed tokens: 2422210560 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.760322E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.900 | TFLOPs: 31.53 | 7: iteration 4630/ 115203 | consumed samples: 1185280 | consumed tokens: 2427453440 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.776772E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.368 | TFLOPs: 31.92 | 7: iteration 4640/ 115203 | consumed samples: 1187840 | consumed tokens: 2432696320 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.786764E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.675 | TFLOPs: 31.67 | 7: iteration 4650/ 115203 | consumed samples: 1190400 | consumed tokens: 2437939200 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.743411E+00 | grad norm: 0.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.466 | TFLOPs: 31.82 | 7: iteration 4660/ 115203 | consumed samples: 1192960 | consumed tokens: 2443182080 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.742096E+00 | grad norm: 0.339 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.219 | TFLOPs: 31.23 | 7: iteration 4670/ 115203 | consumed samples: 1195520 | consumed tokens: 2448424960 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.722978E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.622 | TFLOPs: 31.36 | 7: iteration 4680/ 115203 | consumed samples: 1198080 | consumed tokens: 2453667840 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.741644E+00 | grad norm: 0.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.085 | TFLOPs: 31.38 | 7: iteration 4690/ 115203 | consumed samples: 1200640 | consumed tokens: 2458910720 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.767754E+00 | grad norm: 0.342 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.617 | TFLOPs: 31.41 | 7: iteration 4700/ 115203 | consumed samples: 1203200 | consumed tokens: 2464153600 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.749236E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.060 | TFLOPs: 31.33 | 7: iteration 4710/ 115203 | consumed samples: 1205760 | consumed tokens: 2469396480 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.726210E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.570 | TFLOPs: 31.35 | 7: iteration 4720/ 115203 | consumed samples: 1208320 | consumed tokens: 2474639360 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.745448E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.773 | TFLOPs: 31.68 | 7: iteration 4730/ 115203 | consumed samples: 1210880 | consumed tokens: 2479882240 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.720413E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.204 | TFLOPs: 31.54 | 7: iteration 4740/ 115203 | consumed samples: 1213440 | consumed tokens: 2485125120 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.741885E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.187 | TFLOPs: 31.81 | 7: iteration 4750/ 115203 | consumed samples: 1216000 | consumed tokens: 2490368000 | elapsed time per iteration (s): 0.43 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.735183E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.218 | TFLOPs: 31.28 | 7: iteration 4760/ 115203 | consumed samples: 1218560 | consumed tokens: 2495610880 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.743006E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.181 | TFLOPs: 31.91 | 7: iteration 4770/ 115203 | consumed samples: 1221120 | consumed tokens: 2500853760 | elapsed time per iteration (s): 0.44 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.770939E+00 | grad norm: 0.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.296 | TFLOPs: 30.71 | 7: iteration 4780/ 115203 | consumed samples: 1223680 | consumed tokens: 2506096640 | elapsed time per iteration (s): 0.42 | learning rate: 1.996E-04 | global batch size: 256 | lm loss: 2.726019E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.428 | TFLOPs: 31.66 | 7: iteration 4790/ 115203 | consumed samples: 1226240 | consumed tokens: 2511339520 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.748365E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.719 | TFLOPs: 31.10 | 7: iteration 4800/ 115203 | consumed samples: 1228800 | consumed tokens: 2516582400 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.763309E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.880 | TFLOPs: 32.26 | 7: iteration 4810/ 115203 | consumed samples: 1231360 | consumed tokens: 2521825280 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.749315E+00 | grad norm: 0.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.410 | TFLOPs: 31.14 | 7: iteration 4820/ 115203 | consumed samples: 1233920 | consumed tokens: 2527068160 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.734975E+00 | grad norm: 0.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.209 | TFLOPs: 31.07 | 7: iteration 4830/ 115203 | consumed samples: 1236480 | consumed tokens: 2532311040 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.723812E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.809 | TFLOPs: 31.73 | 7: iteration 4840/ 115203 | consumed samples: 1239040 | consumed tokens: 2537553920 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.699832E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.033 | TFLOPs: 31.59 | 7: iteration 4850/ 115203 | consumed samples: 1241600 | consumed tokens: 2542796800 | elapsed time per iteration (s): 0.44 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.733060E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.494 | TFLOPs: 30.46 | 7: iteration 4860/ 115203 | consumed samples: 1244160 | consumed tokens: 2548039680 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.766406E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.838 | TFLOPs: 31.42 | 7: iteration 4870/ 115203 | consumed samples: 1246720 | consumed tokens: 2553282560 | elapsed time per iteration (s): 0.44 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.723220E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.023 | TFLOPs: 30.80 | 7: iteration 4880/ 115203 | consumed samples: 1249280 | consumed tokens: 2558525440 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.768686E+00 | grad norm: 0.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.260 | TFLOPs: 31.76 | 7: iteration 4890/ 115203 | consumed samples: 1251840 | consumed tokens: 2563768320 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.742889E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.472 | TFLOPs: 31.66 | 7: iteration 4900/ 115203 | consumed samples: 1254400 | consumed tokens: 2569011200 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.713692E+00 | grad norm: 0.356 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.548 | TFLOPs: 31.72 | 7: iteration 4910/ 115203 | consumed samples: 1256960 | consumed tokens: 2574254080 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.701981E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.223 | TFLOPs: 31.07 | 7: iteration 4920/ 115203 | consumed samples: 1259520 | consumed tokens: 2579496960 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.750944E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.150 | TFLOPs: 31.44 | 7: iteration 4930/ 115203 | consumed samples: 1262080 | consumed tokens: 2584739840 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.726027E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.024 | TFLOPs: 31.74 | 7: iteration 4940/ 115203 | consumed samples: 1264640 | consumed tokens: 2589982720 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.736222E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.766 | TFLOPs: 31.15 | 7: iteration 4950/ 115203 | consumed samples: 1267200 | consumed tokens: 2595225600 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.760794E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.644 | TFLOPs: 31.41 | 7: iteration 4960/ 115203 | consumed samples: 1269760 | consumed tokens: 2600468480 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.729772E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.909 | TFLOPs: 31.69 | 7: iteration 4970/ 115203 | consumed samples: 1272320 | consumed tokens: 2605711360 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.722445E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.025 | TFLOPs: 31.74 | 7: iteration 4980/ 115203 | consumed samples: 1274880 | consumed tokens: 2610954240 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.736620E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.439 | TFLOPs: 31.82 | 7: iteration 4990/ 115203 | consumed samples: 1277440 | consumed tokens: 2616197120 | elapsed time per iteration (s): 0.45 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.713703E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.128 | TFLOPs: 30.12 | 7: iteration 5000/ 115203 | consumed samples: 1280000 | consumed tokens: 2621440000 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.711087E+00 | grad norm: 0.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.174 | TFLOPs: 31.07 | 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 5000 | lm loss value: 2.734989E+00 | lm loss PPL: 1.540958E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 5000 to checkpoints_221m 0: [2022-11-28 13:31:25,174] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step5000 is begin to save! 0: [2022-11-28 13:31:25,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:31:25,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:31:25,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:31:25,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:31:25,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:31:25,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:31:25,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:31:25,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:31:25,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:31:25,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:31:25,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:31:25,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:31:25,474] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:31:25,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:31:25,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:31:25,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:31:25,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:31:25,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:31:25,568] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:31:25,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:31:25,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:31:25,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:31:25,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:31:25,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:31:25,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:31:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:31:25,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:31:25,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:31:25,728] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:31:25,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:31:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:31:25,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:31:25,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:31:25,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:31:25,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:31:25,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:31:25,856] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:31:25,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:31:25,888] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:31:25,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:31:25,892] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step5000/mp_rank_00_model_states.pt 0: [2022-11-28 13:31:25,892] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:31:25,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:31:25,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:31:25,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,965] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,966] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,969] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,969] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 4: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:31:25,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 7: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:31:25,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 2: [2022-11-28 13:31:25,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:31:25,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:31:25,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 5: [2022-11-28 13:31:25,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 13:31:25,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:31:25,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:25,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:25,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:31:25,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:31:25,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 1: [2022-11-28 13:31:25,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 6: [2022-11-28 13:31:25,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:31:25,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:31:25,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:31:25,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 3: [2022-11-28 13:31:25,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: [2022-11-28 13:31:26,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step5000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:31:26,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step5000 is ready now! 0: successfully saved checkpoint at iteration 5000 to checkpoints_221m 7: time (ms) | save-checkpoint: 859.83 7: iteration 5010/ 115203 | consumed samples: 1282560 | consumed tokens: 2626682880 | elapsed time per iteration (s): 0.52 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.720555E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 492.377 | TFLOPs: 25.83 | 7: iteration 5020/ 115203 | consumed samples: 1285120 | consumed tokens: 2631925760 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.699023E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.388 | TFLOPs: 31.76 | 7: iteration 5030/ 115203 | consumed samples: 1287680 | consumed tokens: 2637168640 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.714726E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.326 | TFLOPs: 31.97 | 7: iteration 5040/ 115203 | consumed samples: 1290240 | consumed tokens: 2642411520 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.693968E+00 | grad norm: 0.324 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.299 | TFLOPs: 31.60 | 7: iteration 5050/ 115203 | consumed samples: 1292800 | consumed tokens: 2647654400 | elapsed time per iteration (s): 0.44 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.687783E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.472 | TFLOPs: 30.56 | 7: iteration 5060/ 115203 | consumed samples: 1295360 | consumed tokens: 2652897280 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.694592E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.193 | TFLOPs: 31.12 | 7: iteration 5070/ 115203 | consumed samples: 1297920 | consumed tokens: 2658140160 | elapsed time per iteration (s): 0.44 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.716644E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.895 | TFLOPs: 30.64 | 7: iteration 5080/ 115203 | consumed samples: 1300480 | consumed tokens: 2663383040 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.738864E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.433 | TFLOPs: 31.03 | 7: iteration 5090/ 115203 | consumed samples: 1303040 | consumed tokens: 2668625920 | elapsed time per iteration (s): 0.47 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.716520E+00 | grad norm: 0.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 548.958 | TFLOPs: 28.80 | 7: iteration 5100/ 115203 | consumed samples: 1305600 | consumed tokens: 2673868800 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.745990E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.364 | TFLOPs: 31.92 | 7: iteration 5110/ 115203 | consumed samples: 1308160 | consumed tokens: 2679111680 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.724078E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.466 | TFLOPs: 31.56 | 7: iteration 5120/ 115203 | consumed samples: 1310720 | consumed tokens: 2684354560 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.730044E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.056 | TFLOPs: 31.90 | 7: iteration 5130/ 115203 | consumed samples: 1313280 | consumed tokens: 2689597440 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.728725E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.481 | TFLOPs: 32.03 | 7: iteration 5140/ 115203 | consumed samples: 1315840 | consumed tokens: 2694840320 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.714914E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.326 | TFLOPs: 31.92 | 7: iteration 5150/ 115203 | consumed samples: 1318400 | consumed tokens: 2700083200 | elapsed time per iteration (s): 0.43 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.728773E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.081 | TFLOPs: 31.33 | 7: iteration 5160/ 115203 | consumed samples: 1320960 | consumed tokens: 2705326080 | elapsed time per iteration (s): 0.42 | learning rate: 1.995E-04 | global batch size: 256 | lm loss: 2.729730E+00 | grad norm: 0.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.558 | TFLOPs: 31.72 | 7: iteration 5170/ 115203 | consumed samples: 1323520 | consumed tokens: 2710568960 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.714911E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.613 | TFLOPs: 31.83 | 7: iteration 5180/ 115203 | consumed samples: 1326080 | consumed tokens: 2715811840 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.718858E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.493 | TFLOPs: 31.56 | 7: iteration 5190/ 115203 | consumed samples: 1328640 | consumed tokens: 2721054720 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.695822E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.184 | TFLOPs: 31.96 | 7: iteration 5200/ 115203 | consumed samples: 1331200 | consumed tokens: 2726297600 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.708167E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.858 | TFLOPs: 31.63 | 7: iteration 5210/ 115203 | consumed samples: 1333760 | consumed tokens: 2731540480 | elapsed time per iteration (s): 0.44 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.688043E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.667 | TFLOPs: 30.62 | 7: iteration 5220/ 115203 | consumed samples: 1336320 | consumed tokens: 2736783360 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.697889E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.339 | TFLOPs: 30.92 | 7: iteration 5230/ 115203 | consumed samples: 1338880 | consumed tokens: 2742026240 | elapsed time per iteration (s): 0.44 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.737921E+00 | grad norm: 0.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.332 | TFLOPs: 30.34 | 7: iteration 5240/ 115203 | consumed samples: 1341440 | consumed tokens: 2747269120 | elapsed time per iteration (s): 0.44 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.693915E+00 | grad norm: 0.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.007 | TFLOPs: 30.43 | 7: iteration 5250/ 115203 | consumed samples: 1344000 | consumed tokens: 2752512000 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.719698E+00 | grad norm: 0.411 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.666 | TFLOPs: 31.99 | 7: iteration 5260/ 115203 | consumed samples: 1346560 | consumed tokens: 2757754880 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.736667E+00 | grad norm: 0.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.840 | TFLOPs: 31.68 | 7: iteration 5270/ 115203 | consumed samples: 1349120 | consumed tokens: 2762997760 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.673276E+00 | grad norm: 0.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.571 | TFLOPs: 30.88 | 7: iteration 5280/ 115203 | consumed samples: 1351680 | consumed tokens: 2768240640 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.725698E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.316 | TFLOPs: 31.66 | 7: iteration 5290/ 115203 | consumed samples: 1354240 | consumed tokens: 2773483520 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.687893E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.652 | TFLOPs: 30.99 | 7: iteration 5300/ 115203 | consumed samples: 1356800 | consumed tokens: 2778726400 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.691542E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.615 | TFLOPs: 31.78 | 7: iteration 5310/ 115203 | consumed samples: 1359360 | consumed tokens: 2783969280 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.730233E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.849 | TFLOPs: 32.10 | 7: iteration 5320/ 115203 | consumed samples: 1361920 | consumed tokens: 2789212160 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.701970E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.677 | TFLOPs: 31.52 | 7: iteration 5330/ 115203 | consumed samples: 1364480 | consumed tokens: 2794455040 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.707352E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.504 | TFLOPs: 31.04 | 7: iteration 5340/ 115203 | consumed samples: 1367040 | consumed tokens: 2799697920 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.730512E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.138 | TFLOPs: 31.75 | 7: iteration 5350/ 115203 | consumed samples: 1369600 | consumed tokens: 2804940800 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.677471E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.556 | TFLOPs: 31.51 | 7: iteration 5360/ 115203 | consumed samples: 1372160 | consumed tokens: 2810183680 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.732915E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.396 | TFLOPs: 31.40 | 7: iteration 5370/ 115203 | consumed samples: 1374720 | consumed tokens: 2815426560 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.718034E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.201 | TFLOPs: 32.02 | 7: iteration 5380/ 115203 | consumed samples: 1377280 | consumed tokens: 2820669440 | elapsed time per iteration (s): 0.45 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.676296E+00 | grad norm: 0.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.729 | TFLOPs: 29.95 | 7: iteration 5390/ 115203 | consumed samples: 1379840 | consumed tokens: 2825912320 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.713352E+00 | grad norm: 0.338 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.878 | TFLOPs: 31.05 | 7: iteration 5400/ 115203 | consumed samples: 1382400 | consumed tokens: 2831155200 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.673900E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.337 | TFLOPs: 31.71 | 7: iteration 5410/ 115203 | consumed samples: 1384960 | consumed tokens: 2836398080 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.709321E+00 | grad norm: 0.315 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.404 | TFLOPs: 31.08 | 7: iteration 5420/ 115203 | consumed samples: 1387520 | consumed tokens: 2841640960 | elapsed time per iteration (s): 0.44 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.697056E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.741 | TFLOPs: 30.68 | 7: iteration 5430/ 115203 | consumed samples: 1390080 | consumed tokens: 2846883840 | elapsed time per iteration (s): 0.45 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.718823E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.032 | TFLOPs: 30.12 | 7: iteration 5440/ 115203 | consumed samples: 1392640 | consumed tokens: 2852126720 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.701167E+00 | grad norm: 0.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.599 | TFLOPs: 32.14 | 7: iteration 5450/ 115203 | consumed samples: 1395200 | consumed tokens: 2857369600 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.707339E+00 | grad norm: 0.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.724 | TFLOPs: 31.15 | 7: iteration 5460/ 115203 | consumed samples: 1397760 | consumed tokens: 2862612480 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.725565E+00 | grad norm: 0.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.462 | TFLOPs: 31.77 | 7: iteration 5470/ 115203 | consumed samples: 1400320 | consumed tokens: 2867855360 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.701438E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.613 | TFLOPs: 31.67 | 7: iteration 5480/ 115203 | consumed samples: 1402880 | consumed tokens: 2873098240 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.698350E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.585 | TFLOPs: 31.88 | 7: iteration 5490/ 115203 | consumed samples: 1405440 | consumed tokens: 2878341120 | elapsed time per iteration (s): 0.43 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.670302E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.122 | TFLOPs: 31.02 | 7: iteration 5500/ 115203 | consumed samples: 1408000 | consumed tokens: 2883584000 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.731922E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.936 | TFLOPs: 31.95 | 7: iteration 5510/ 115203 | consumed samples: 1410560 | consumed tokens: 2888826880 | elapsed time per iteration (s): 0.42 | learning rate: 1.994E-04 | global batch size: 256 | lm loss: 2.698251E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.879 | TFLOPs: 31.95 | 7: iteration 5520/ 115203 | consumed samples: 1413120 | consumed tokens: 2894069760 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.728746E+00 | grad norm: 0.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.501 | TFLOPs: 31.09 | 7: iteration 5530/ 115203 | consumed samples: 1415680 | consumed tokens: 2899312640 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.697376E+00 | grad norm: 0.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.494 | TFLOPs: 31.35 | 7: iteration 5540/ 115203 | consumed samples: 1418240 | consumed tokens: 2904555520 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.738158E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.605 | TFLOPs: 31.04 | 7: iteration 5550/ 115203 | consumed samples: 1420800 | consumed tokens: 2909798400 | elapsed time per iteration (s): 0.44 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.703959E+00 | grad norm: 0.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.721 | TFLOPs: 30.42 | 7: iteration 5560/ 115203 | consumed samples: 1423360 | consumed tokens: 2915041280 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.693525E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.970 | TFLOPs: 31.48 | 7: iteration 5570/ 115203 | consumed samples: 1425920 | consumed tokens: 2920284160 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.719495E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.261 | TFLOPs: 31.02 | 7: iteration 5580/ 115203 | consumed samples: 1428480 | consumed tokens: 2925527040 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.679785E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.274 | TFLOPs: 31.29 | 7: iteration 5590/ 115203 | consumed samples: 1431040 | consumed tokens: 2930769920 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.691611E+00 | grad norm: 0.318 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.484 | TFLOPs: 31.56 | 7: iteration 5600/ 115203 | consumed samples: 1433600 | consumed tokens: 2936012800 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.676710E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.253 | TFLOPs: 31.60 | 7: iteration 5610/ 115203 | consumed samples: 1436160 | consumed tokens: 2941255680 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.730878E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.191 | TFLOPs: 31.60 | 7: iteration 5620/ 115203 | consumed samples: 1438720 | consumed tokens: 2946498560 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.679606E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.515 | TFLOPs: 31.56 | 7: iteration 5630/ 115203 | consumed samples: 1441280 | consumed tokens: 2951741440 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.729412E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.707 | TFLOPs: 31.10 | 7: iteration 5640/ 115203 | consumed samples: 1443840 | consumed tokens: 2956984320 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.648170E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.609 | TFLOPs: 31.57 | 7: iteration 5650/ 115203 | consumed samples: 1446400 | consumed tokens: 2962227200 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.698568E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.260 | TFLOPs: 30.97 | 7: iteration 5660/ 115203 | consumed samples: 1448960 | consumed tokens: 2967470080 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.681496E+00 | grad norm: 0.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.428 | TFLOPs: 31.03 | 7: iteration 5670/ 115203 | consumed samples: 1451520 | consumed tokens: 2972712960 | elapsed time per iteration (s): 0.44 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.718910E+00 | grad norm: 0.345 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.851 | TFLOPs: 30.69 | 7: iteration 5680/ 115203 | consumed samples: 1454080 | consumed tokens: 2977955840 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.700417E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.682 | TFLOPs: 31.62 | 7: iteration 5690/ 115203 | consumed samples: 1456640 | consumed tokens: 2983198720 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.742452E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.722 | TFLOPs: 31.68 | 7: iteration 5700/ 115203 | consumed samples: 1459200 | consumed tokens: 2988441600 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.716525E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.180 | TFLOPs: 31.18 | 7: iteration 5710/ 115203 | consumed samples: 1461760 | consumed tokens: 2993684480 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.686621E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.669 | TFLOPs: 31.36 | 7: iteration 5720/ 115203 | consumed samples: 1464320 | consumed tokens: 2998927360 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.669016E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.971 | TFLOPs: 31.43 | 7: iteration 5730/ 115203 | consumed samples: 1466880 | consumed tokens: 3004170240 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.715472E+00 | grad norm: 0.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.140 | TFLOPs: 31.75 | 7: iteration 5740/ 115203 | consumed samples: 1469440 | consumed tokens: 3009413120 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.686048E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.304 | TFLOPs: 31.44 | 7: iteration 5750/ 115203 | consumed samples: 1472000 | consumed tokens: 3014656000 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.646225E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.938 | TFLOPs: 31.11 | 7: iteration 5760/ 115203 | consumed samples: 1474560 | consumed tokens: 3019898880 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.683411E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.349 | TFLOPs: 31.55 | 7: iteration 5770/ 115203 | consumed samples: 1477120 | consumed tokens: 3025141760 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.676312E+00 | grad norm: 0.305 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.459 | TFLOPs: 31.30 | 7: iteration 5780/ 115203 | consumed samples: 1479680 | consumed tokens: 3030384640 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.676854E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.388 | TFLOPs: 31.03 | 7: iteration 5790/ 115203 | consumed samples: 1482240 | consumed tokens: 3035627520 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.690416E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.003 | TFLOPs: 31.59 | 7: iteration 5800/ 115203 | consumed samples: 1484800 | consumed tokens: 3040870400 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.709655E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.416 | TFLOPs: 31.40 | 7: iteration 5810/ 115203 | consumed samples: 1487360 | consumed tokens: 3046113280 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.675404E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.226 | TFLOPs: 31.18 | 7: iteration 5820/ 115203 | consumed samples: 1489920 | consumed tokens: 3051356160 | elapsed time per iteration (s): 0.42 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.706892E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.232 | TFLOPs: 32.07 | 7: iteration 5830/ 115203 | consumed samples: 1492480 | consumed tokens: 3056599040 | elapsed time per iteration (s): 0.44 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.673211E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.375 | TFLOPs: 30.82 | 7: iteration 5840/ 115203 | consumed samples: 1495040 | consumed tokens: 3061841920 | elapsed time per iteration (s): 0.43 | learning rate: 1.993E-04 | global batch size: 256 | lm loss: 2.691093E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.315 | TFLOPs: 31.39 | 7: iteration 5850/ 115203 | consumed samples: 1497600 | consumed tokens: 3067084800 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.697670E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.762 | TFLOPs: 31.42 | 7: iteration 5860/ 115203 | consumed samples: 1500160 | consumed tokens: 3072327680 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.687530E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.277 | TFLOPs: 31.02 | 7: iteration 5870/ 115203 | consumed samples: 1502720 | consumed tokens: 3077570560 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.689827E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.863 | TFLOPs: 31.74 | 7: iteration 5880/ 115203 | consumed samples: 1505280 | consumed tokens: 3082813440 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.685918E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.051 | TFLOPs: 31.75 | 7: iteration 5890/ 115203 | consumed samples: 1507840 | consumed tokens: 3088056320 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.646182E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.746 | TFLOPs: 31.00 | 7: iteration 5900/ 115203 | consumed samples: 1510400 | consumed tokens: 3093299200 | elapsed time per iteration (s): 0.45 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.658310E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.060 | TFLOPs: 29.86 | 7: iteration 5910/ 115203 | consumed samples: 1512960 | consumed tokens: 3098542080 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.679378E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.907 | TFLOPs: 30.64 | 7: iteration 5920/ 115203 | consumed samples: 1515520 | consumed tokens: 3103784960 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.698341E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.460 | TFLOPs: 30.40 | 7: iteration 5930/ 115203 | consumed samples: 1518080 | consumed tokens: 3109027840 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.676973E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.565 | TFLOPs: 32.04 | 7: iteration 5940/ 115203 | consumed samples: 1520640 | consumed tokens: 3114270720 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.681586E+00 | grad norm: 0.308 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.258 | TFLOPs: 31.70 | 7: iteration 5950/ 115203 | consumed samples: 1523200 | consumed tokens: 3119513600 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.689439E+00 | grad norm: 0.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.407 | TFLOPs: 31.08 | 7: iteration 5960/ 115203 | consumed samples: 1525760 | consumed tokens: 3124756480 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.686522E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.825 | TFLOPs: 31.79 | 7: iteration 5970/ 115203 | consumed samples: 1528320 | consumed tokens: 3129999360 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.725084E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.072 | TFLOPs: 30.70 | 7: iteration 5980/ 115203 | consumed samples: 1530880 | consumed tokens: 3135242240 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.686209E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.349 | TFLOPs: 30.40 | 7: iteration 5990/ 115203 | consumed samples: 1533440 | consumed tokens: 3140485120 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.692246E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.579 | TFLOPs: 32.04 | 0: [2022-11-28 13:38:35,053] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=0, lr=[0.00019919872690019844, 0.00019919872690019844, 0.00019919872690019844], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 6000/ 115203 | consumed samples: 1536000 | consumed tokens: 3145728000 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.680109E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.716 | TFLOPs: 31.20 | 0: steps: 6000 loss: 2.6911 iter time (s): 0.426 samples/sec: 601.032 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 6000 | lm loss value: 2.690692E+00 | lm loss PPL: 1.474187E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 6000 to checkpoints_221m 0: [2022-11-28 13:38:35,248] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step6000 is begin to save! 0: [2022-11-28 13:38:35,252] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:38:35,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:38:35,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:38:35,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:38:35,376] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:38:35,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:38:35,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:38:35,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:38:35,420] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:38:35,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:38:35,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:38:35,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:38:35,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:38:35,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:38:35,489] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:38:35,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:38:35,511] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:38:35,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:38:35,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:38:35,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:38:35,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:38:35,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:38:35,583] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:38:35,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:38:35,605] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:38:35,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:38:35,629] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:38:35,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:38:35,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:38:35,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:38:35,676] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:38:35,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:38:35,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:38:35,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:38:35,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:38:35,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:38:35,747] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:38:35,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:38:35,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:38:35,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:38:35,775] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step6000/mp_rank_00_model_states.pt 0: [2022-11-28 13:38:35,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:38:35,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:38:35,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:38:35,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,842] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,842] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 5: [2022-11-28 13:38:35,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:38:35,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 13:38:35,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 7: [2022-11-28 13:38:35,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:38:35,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:38:35,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:38:35,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2022-11-28 13:38:35,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 1: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 2: [2022-11-28 13:38:35,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 6: [2022-11-28 13:38:35,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:38:35,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:38:35,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 3: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:38:35,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:38:35,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:38:35,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: [2022-11-28 13:38:35,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:38:35,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step6000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:38:35,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step6000 is ready now! 0: successfully saved checkpoint at iteration 6000 to checkpoints_221m 7: time (ms) | save-checkpoint: 658.81 7: iteration 6010/ 115203 | consumed samples: 1538560 | consumed tokens: 3150970880 | elapsed time per iteration (s): 0.51 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.703827E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 503.012 | TFLOPs: 26.39 | 7: iteration 6020/ 115203 | consumed samples: 1541120 | consumed tokens: 3156213760 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.668984E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.986 | TFLOPs: 30.59 | 7: iteration 6030/ 115203 | consumed samples: 1543680 | consumed tokens: 3161456640 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.664137E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.991 | TFLOPs: 31.38 | 7: iteration 6040/ 115203 | consumed samples: 1546240 | consumed tokens: 3166699520 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.708683E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.537 | TFLOPs: 31.19 | 7: iteration 6050/ 115203 | consumed samples: 1548800 | consumed tokens: 3171942400 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.652887E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.197 | TFLOPs: 31.02 | 7: iteration 6060/ 115203 | consumed samples: 1551360 | consumed tokens: 3177185280 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.663381E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.386 | TFLOPs: 31.87 | 7: iteration 6070/ 115203 | consumed samples: 1553920 | consumed tokens: 3182428160 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.699043E+00 | grad norm: 0.327 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.965 | TFLOPs: 31.69 | 7: iteration 6080/ 115203 | consumed samples: 1556480 | consumed tokens: 3187671040 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.689996E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.558 | TFLOPs: 31.04 | 7: iteration 6090/ 115203 | consumed samples: 1559040 | consumed tokens: 3192913920 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.667496E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.008 | TFLOPs: 31.53 | 7: iteration 6100/ 115203 | consumed samples: 1561600 | consumed tokens: 3198156800 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.692229E+00 | grad norm: 0.302 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.840 | TFLOPs: 30.84 | 7: iteration 6110/ 115203 | consumed samples: 1564160 | consumed tokens: 3203399680 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.659524E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.012 | TFLOPs: 31.80 | 7: iteration 6120/ 115203 | consumed samples: 1566720 | consumed tokens: 3208642560 | elapsed time per iteration (s): 0.42 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.639627E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.359 | TFLOPs: 31.81 | 7: iteration 6130/ 115203 | consumed samples: 1569280 | consumed tokens: 3213885440 | elapsed time per iteration (s): 0.43 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.675424E+00 | grad norm: 0.312 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.306 | TFLOPs: 30.92 | 7: iteration 6140/ 115203 | consumed samples: 1571840 | consumed tokens: 3219128320 | elapsed time per iteration (s): 0.44 | learning rate: 1.992E-04 | global batch size: 256 | lm loss: 2.660894E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.853 | TFLOPs: 30.84 | 7: iteration 6150/ 115203 | consumed samples: 1574400 | consumed tokens: 3224371200 | elapsed time per iteration (s): 0.44 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.663518E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.206 | TFLOPs: 30.34 | 7: iteration 6160/ 115203 | consumed samples: 1576960 | consumed tokens: 3229614080 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.653296E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.396 | TFLOPs: 31.61 | 7: iteration 6170/ 115203 | consumed samples: 1579520 | consumed tokens: 3234856960 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.662345E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.182 | TFLOPs: 31.49 | 7: iteration 6180/ 115203 | consumed samples: 1582080 | consumed tokens: 3240099840 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.650802E+00 | grad norm: 0.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.349 | TFLOPs: 31.18 | 7: iteration 6190/ 115203 | consumed samples: 1584640 | consumed tokens: 3245342720 | elapsed time per iteration (s): 0.44 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.684167E+00 | grad norm: 0.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.709 | TFLOPs: 30.84 | 7: iteration 6200/ 115203 | consumed samples: 1587200 | consumed tokens: 3250585600 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.670848E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.941 | TFLOPs: 31.79 | 7: iteration 6210/ 115203 | consumed samples: 1589760 | consumed tokens: 3255828480 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.704718E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.726 | TFLOPs: 31.83 | 7: iteration 6220/ 115203 | consumed samples: 1592320 | consumed tokens: 3261071360 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.680444E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.719 | TFLOPs: 31.05 | 7: iteration 6230/ 115203 | consumed samples: 1594880 | consumed tokens: 3266314240 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.698092E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.804 | TFLOPs: 31.79 | 7: iteration 6240/ 115203 | consumed samples: 1597440 | consumed tokens: 3271557120 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.695464E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.574 | TFLOPs: 31.56 | 7: iteration 6250/ 115203 | consumed samples: 1600000 | consumed tokens: 3276800000 | elapsed time per iteration (s): 0.45 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.653498E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.454 | TFLOPs: 29.98 | 7: iteration 6260/ 115203 | consumed samples: 1602560 | consumed tokens: 3282042880 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.688695E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.136 | TFLOPs: 31.17 | 7: iteration 6270/ 115203 | consumed samples: 1605120 | consumed tokens: 3287285760 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.659893E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.153 | TFLOPs: 32.12 | 7: iteration 6280/ 115203 | consumed samples: 1607680 | consumed tokens: 3292528640 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.655217E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.928 | TFLOPs: 31.58 | 7: iteration 6290/ 115203 | consumed samples: 1610240 | consumed tokens: 3297771520 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.649277E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.257 | TFLOPs: 31.81 | 7: iteration 6300/ 115203 | consumed samples: 1612800 | consumed tokens: 3303014400 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.658443E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.294 | TFLOPs: 31.92 | 7: iteration 6310/ 115203 | consumed samples: 1615360 | consumed tokens: 3308257280 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.645121E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.406 | TFLOPs: 31.08 | 7: iteration 6320/ 115203 | consumed samples: 1617920 | consumed tokens: 3313500160 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.666929E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.381 | TFLOPs: 31.40 | 7: iteration 6330/ 115203 | consumed samples: 1620480 | consumed tokens: 3318743040 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.697141E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.178 | TFLOPs: 31.75 | 7: iteration 6340/ 115203 | consumed samples: 1623040 | consumed tokens: 3323985920 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.625522E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.059 | TFLOPs: 31.54 | 7: iteration 6350/ 115203 | consumed samples: 1625600 | consumed tokens: 3329228800 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.664733E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.888 | TFLOPs: 30.90 | 7: iteration 6360/ 115203 | consumed samples: 1628160 | consumed tokens: 3334471680 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.655094E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.111 | TFLOPs: 31.64 | 7: iteration 6370/ 115203 | consumed samples: 1630720 | consumed tokens: 3339714560 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.680215E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.603 | TFLOPs: 31.09 | 7: iteration 6380/ 115203 | consumed samples: 1633280 | consumed tokens: 3344957440 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.644618E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.203 | TFLOPs: 31.07 | 7: iteration 6390/ 115203 | consumed samples: 1635840 | consumed tokens: 3350200320 | elapsed time per iteration (s): 0.45 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.655099E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.689 | TFLOPs: 29.89 | 7: iteration 6400/ 115203 | consumed samples: 1638400 | consumed tokens: 3355443200 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.656065E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.855 | TFLOPs: 31.42 | 7: iteration 6410/ 115203 | consumed samples: 1640960 | consumed tokens: 3360686080 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.679561E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.414 | TFLOPs: 31.50 | 7: iteration 6420/ 115203 | consumed samples: 1643520 | consumed tokens: 3365928960 | elapsed time per iteration (s): 0.42 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.634914E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.653 | TFLOPs: 32.09 | 7: iteration 6430/ 115203 | consumed samples: 1646080 | consumed tokens: 3371171840 | elapsed time per iteration (s): 0.43 | learning rate: 1.991E-04 | global batch size: 256 | lm loss: 2.656168E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.475 | TFLOPs: 31.14 | 7: iteration 6440/ 115203 | consumed samples: 1648640 | consumed tokens: 3376414720 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.661645E+00 | grad norm: 0.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.045 | TFLOPs: 31.80 | 7: iteration 6450/ 115203 | consumed samples: 1651200 | consumed tokens: 3381657600 | elapsed time per iteration (s): 0.44 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.654392E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.960 | TFLOPs: 30.69 | 7: iteration 6460/ 115203 | consumed samples: 1653760 | consumed tokens: 3386900480 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.617538E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.015 | TFLOPs: 31.38 | 7: iteration 6470/ 115203 | consumed samples: 1656320 | consumed tokens: 3392143360 | elapsed time per iteration (s): 0.45 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.635867E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.066 | TFLOPs: 29.91 | 7: iteration 6480/ 115203 | consumed samples: 1658880 | consumed tokens: 3397386240 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.665616E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.556 | TFLOPs: 31.62 | 7: iteration 6490/ 115203 | consumed samples: 1661440 | consumed tokens: 3402629120 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.645813E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.364 | TFLOPs: 31.76 | 7: iteration 6500/ 115203 | consumed samples: 1664000 | consumed tokens: 3407872000 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.624338E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.534 | TFLOPs: 32.30 | 7: iteration 6510/ 115203 | consumed samples: 1666560 | consumed tokens: 3413114880 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.656061E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.503 | TFLOPs: 31.09 | 7: iteration 6520/ 115203 | consumed samples: 1669120 | consumed tokens: 3418357760 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.667853E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.773 | TFLOPs: 31.73 | 7: iteration 6530/ 115203 | consumed samples: 1671680 | consumed tokens: 3423600640 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.629967E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.754 | TFLOPs: 31.26 | 7: iteration 6540/ 115203 | consumed samples: 1674240 | consumed tokens: 3428843520 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.637601E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.193 | TFLOPs: 31.70 | 7: iteration 6550/ 115203 | consumed samples: 1676800 | consumed tokens: 3434086400 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.667686E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.650 | TFLOPs: 31.41 | 7: iteration 6560/ 115203 | consumed samples: 1679360 | consumed tokens: 3439329280 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.651985E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.768 | TFLOPs: 31.52 | 7: iteration 6570/ 115203 | consumed samples: 1681920 | consumed tokens: 3444572160 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.652929E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.858 | TFLOPs: 30.90 | 7: iteration 6580/ 115203 | consumed samples: 1684480 | consumed tokens: 3449815040 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.661971E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.158 | TFLOPs: 31.38 | 7: iteration 6590/ 115203 | consumed samples: 1687040 | consumed tokens: 3455057920 | elapsed time per iteration (s): 0.45 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.649723E+00 | grad norm: 0.296 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.259 | TFLOPs: 29.87 | 7: iteration 6600/ 115203 | consumed samples: 1689600 | consumed tokens: 3460300800 | elapsed time per iteration (s): 0.44 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.649987E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.678 | TFLOPs: 30.26 | 7: iteration 6610/ 115203 | consumed samples: 1692160 | consumed tokens: 3465543680 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.652278E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.089 | TFLOPs: 31.96 | 7: iteration 6620/ 115203 | consumed samples: 1694720 | consumed tokens: 3470786560 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.626271E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.298 | TFLOPs: 31.18 | 7: iteration 6630/ 115203 | consumed samples: 1697280 | consumed tokens: 3476029440 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.657384E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.669 | TFLOPs: 31.62 | 7: iteration 6640/ 115203 | consumed samples: 1699840 | consumed tokens: 3481272320 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.628877E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.163 | TFLOPs: 31.65 | 7: iteration 6650/ 115203 | consumed samples: 1702400 | consumed tokens: 3486515200 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.615055E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.104 | TFLOPs: 32.06 | 7: iteration 6660/ 115203 | consumed samples: 1704960 | consumed tokens: 3491758080 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.669502E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.288 | TFLOPs: 31.65 | 7: iteration 6670/ 115203 | consumed samples: 1707520 | consumed tokens: 3497000960 | elapsed time per iteration (s): 0.42 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.633158E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.140 | TFLOPs: 31.91 | 7: iteration 6680/ 115203 | consumed samples: 1710080 | consumed tokens: 3502243840 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.626515E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.558 | TFLOPs: 31.25 | 7: iteration 6690/ 115203 | consumed samples: 1712640 | consumed tokens: 3507486720 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.644501E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.608 | TFLOPs: 31.46 | 7: iteration 6700/ 115203 | consumed samples: 1715200 | consumed tokens: 3512729600 | elapsed time per iteration (s): 0.43 | learning rate: 1.990E-04 | global batch size: 256 | lm loss: 2.652216E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.513 | TFLOPs: 31.51 | 7: iteration 6710/ 115203 | consumed samples: 1717760 | consumed tokens: 3517972480 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.667801E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.622 | TFLOPs: 31.57 | 7: iteration 6720/ 115203 | consumed samples: 1720320 | consumed tokens: 3523215360 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.623304E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.313 | TFLOPs: 31.34 | 7: iteration 6730/ 115203 | consumed samples: 1722880 | consumed tokens: 3528458240 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.667866E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.591 | TFLOPs: 31.67 | 7: iteration 6740/ 115203 | consumed samples: 1725440 | consumed tokens: 3533701120 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.644641E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.851 | TFLOPs: 31.95 | 7: iteration 6750/ 115203 | consumed samples: 1728000 | consumed tokens: 3538944000 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.621532E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.327 | TFLOPs: 31.81 | 7: iteration 6760/ 115203 | consumed samples: 1730560 | consumed tokens: 3544186880 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.641610E+00 | grad norm: 0.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.802 | TFLOPs: 31.58 | 7: iteration 6770/ 115203 | consumed samples: 1733120 | consumed tokens: 3549429760 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.603174E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.422 | TFLOPs: 31.66 | 7: iteration 6780/ 115203 | consumed samples: 1735680 | consumed tokens: 3554672640 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.616517E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.950 | TFLOPs: 31.37 | 7: iteration 6790/ 115203 | consumed samples: 1738240 | consumed tokens: 3559915520 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.639474E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.843 | TFLOPs: 31.58 | 7: iteration 6800/ 115203 | consumed samples: 1740800 | consumed tokens: 3565158400 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.688919E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.160 | TFLOPs: 31.49 | 7: iteration 6810/ 115203 | consumed samples: 1743360 | consumed tokens: 3570401280 | elapsed time per iteration (s): 0.44 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.631459E+00 | grad norm: 0.346 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.615 | TFLOPs: 30.36 | 7: iteration 6820/ 115203 | consumed samples: 1745920 | consumed tokens: 3575644160 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.634695E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.724 | TFLOPs: 31.52 | 7: iteration 6830/ 115203 | consumed samples: 1748480 | consumed tokens: 3580887040 | elapsed time per iteration (s): 0.44 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.650022E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.310 | TFLOPs: 30.45 | 7: iteration 6840/ 115203 | consumed samples: 1751040 | consumed tokens: 3586129920 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.639548E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.518 | TFLOPs: 32.03 | 7: iteration 6850/ 115203 | consumed samples: 1753600 | consumed tokens: 3591372800 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.635647E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.329 | TFLOPs: 32.02 | 7: iteration 6860/ 115203 | consumed samples: 1756160 | consumed tokens: 3596615680 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.617899E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.760 | TFLOPs: 31.68 | 7: iteration 6870/ 115203 | consumed samples: 1758720 | consumed tokens: 3601858560 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.668957E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.301 | TFLOPs: 31.81 | 7: iteration 6880/ 115203 | consumed samples: 1761280 | consumed tokens: 3607101440 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.654554E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 616.108 | TFLOPs: 32.33 | 7: iteration 6890/ 115203 | consumed samples: 1763840 | consumed tokens: 3612344320 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.589101E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.578 | TFLOPs: 31.83 | 7: iteration 6900/ 115203 | consumed samples: 1766400 | consumed tokens: 3617587200 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.597697E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.307 | TFLOPs: 31.76 | 7: iteration 6910/ 115203 | consumed samples: 1768960 | consumed tokens: 3622830080 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.634653E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.788 | TFLOPs: 31.47 | 7: iteration 6920/ 115203 | consumed samples: 1771520 | consumed tokens: 3628072960 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.630124E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.036 | TFLOPs: 31.22 | 7: iteration 6930/ 115203 | consumed samples: 1774080 | consumed tokens: 3633315840 | elapsed time per iteration (s): 0.44 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.624314E+00 | grad norm: 0.288 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.989 | TFLOPs: 30.85 | 7: iteration 6940/ 115203 | consumed samples: 1776640 | consumed tokens: 3638558720 | elapsed time per iteration (s): 0.43 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.645567E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.954 | TFLOPs: 31.37 | 7: iteration 6950/ 115203 | consumed samples: 1779200 | consumed tokens: 3643801600 | elapsed time per iteration (s): 0.42 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.620126E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.632 | TFLOPs: 31.72 | 7: iteration 6960/ 115203 | consumed samples: 1781760 | consumed tokens: 3649044480 | elapsed time per iteration (s): 0.44 | learning rate: 1.989E-04 | global batch size: 256 | lm loss: 2.663812E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.983 | TFLOPs: 30.38 | 7: iteration 6970/ 115203 | consumed samples: 1784320 | consumed tokens: 3654287360 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.618614E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.477 | TFLOPs: 31.35 | 7: iteration 6980/ 115203 | consumed samples: 1786880 | consumed tokens: 3659530240 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.599312E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.441 | TFLOPs: 31.98 | 7: iteration 6990/ 115203 | consumed samples: 1789440 | consumed tokens: 3664773120 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.656082E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.915 | TFLOPs: 31.63 | 7: iteration 7000/ 115203 | consumed samples: 1792000 | consumed tokens: 3670016000 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.642059E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.892 | TFLOPs: 31.37 | 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 7000 | lm loss value: 2.587309E+00 | lm loss PPL: 1.329394E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 7000 to checkpoints_221m 0: [2022-11-28 13:45:44,152] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step7000 is begin to save! 0: [2022-11-28 13:45:44,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:45:44,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:45:44,253] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:45:44,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:45:44,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:45:44,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:45:44,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:45:44,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:45:44,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:45:44,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:45:44,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:45:44,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:45:44,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:45:44,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:45:44,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:45:44,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:45:44,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:45:44,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:45:44,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:45:44,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:45:44,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:45:44,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:45:44,487] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:45:44,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:45:44,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:45:44,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:45:44,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:45:44,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:45:44,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:45:44,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:45:44,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:45:44,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:45:44,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:45:44,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:45:44,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:45:44,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:45:44,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:45:44,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:45:44,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:45:44,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:45:44,677] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step7000/mp_rank_00_model_states.pt 0: [2022-11-28 13:45:44,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:45:44,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:45:44,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:45:44,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,753] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,753] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 6: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 2: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:45:44,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,759] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 7: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:45:44,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:45:44,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:45:44,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 4: [2022-11-28 13:45:44,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 1: [2022-11-28 13:45:44,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:45:44,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:45:44,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:45:44,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 13:45:44,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 3: [2022-11-28 13:45:44,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,781] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,781] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 5: [2022-11-28 13:45:44,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:45:44,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 13:45:44,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: [2022-11-28 13:45:44,808] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step7000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:45:44,808] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step7000 is ready now! 0: successfully saved checkpoint at iteration 7000 to checkpoints_221m 7: time (ms) | save-checkpoint: 659.48 7: iteration 7010/ 115203 | consumed samples: 1794560 | consumed tokens: 3675258880 | elapsed time per iteration (s): 0.52 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.615719E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 496.607 | TFLOPs: 26.06 | 7: iteration 7020/ 115203 | consumed samples: 1797120 | consumed tokens: 3680501760 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.667915E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.134 | TFLOPs: 31.49 | 7: iteration 7030/ 115203 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.605930E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.258 | TFLOPs: 32.12 | 7: iteration 7040/ 115203 | consumed samples: 1802240 | consumed tokens: 3690987520 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.645922E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.833 | TFLOPs: 31.89 | 7: iteration 7050/ 115203 | consumed samples: 1804800 | consumed tokens: 3696230400 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.645990E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.796 | TFLOPs: 31.21 | 7: iteration 7060/ 115203 | consumed samples: 1807360 | consumed tokens: 3701473280 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.616170E+00 | grad norm: 0.307 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.264 | TFLOPs: 31.49 | 7: iteration 7070/ 115203 | consumed samples: 1809920 | consumed tokens: 3706716160 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.633380E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.464 | TFLOPs: 32.03 | 7: iteration 7080/ 115203 | consumed samples: 1812480 | consumed tokens: 3711959040 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.614907E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.573 | TFLOPs: 31.30 | 7: iteration 7090/ 115203 | consumed samples: 1815040 | consumed tokens: 3717201920 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.638923E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.607 | TFLOPs: 31.67 | 7: iteration 7100/ 115203 | consumed samples: 1817600 | consumed tokens: 3722444800 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.600189E+00 | grad norm: 0.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.948 | TFLOPs: 31.85 | 7: iteration 7110/ 115203 | consumed samples: 1820160 | consumed tokens: 3727687680 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.659567E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.071 | TFLOPs: 31.38 | 7: iteration 7120/ 115203 | consumed samples: 1822720 | consumed tokens: 3732930560 | elapsed time per iteration (s): 0.44 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.623235E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.301 | TFLOPs: 30.34 | 7: iteration 7130/ 115203 | consumed samples: 1825280 | consumed tokens: 3738173440 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.643924E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.937 | TFLOPs: 31.53 | 7: iteration 7140/ 115203 | consumed samples: 1827840 | consumed tokens: 3743416320 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.621994E+00 | grad norm: 0.283 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.850 | TFLOPs: 31.74 | 7: iteration 7150/ 115203 | consumed samples: 1830400 | consumed tokens: 3748659200 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.628886E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.596 | TFLOPs: 31.83 | 7: iteration 7160/ 115203 | consumed samples: 1832960 | consumed tokens: 3753902080 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.653344E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.085 | TFLOPs: 30.96 | 7: iteration 7170/ 115203 | consumed samples: 1835520 | consumed tokens: 3759144960 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.654769E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.299 | TFLOPs: 31.44 | 7: iteration 7180/ 115203 | consumed samples: 1838080 | consumed tokens: 3764387840 | elapsed time per iteration (s): 0.44 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.636500E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.614 | TFLOPs: 30.57 | 7: iteration 7190/ 115203 | consumed samples: 1840640 | consumed tokens: 3769630720 | elapsed time per iteration (s): 0.43 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.608593E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.278 | TFLOPs: 31.44 | 7: iteration 7200/ 115203 | consumed samples: 1843200 | consumed tokens: 3774873600 | elapsed time per iteration (s): 0.42 | learning rate: 1.988E-04 | global batch size: 256 | lm loss: 2.615466E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.962 | TFLOPs: 31.79 | 7: iteration 7210/ 115203 | consumed samples: 1845760 | consumed tokens: 3780116480 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.630380E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.832 | TFLOPs: 32.00 | 7: iteration 7220/ 115203 | consumed samples: 1848320 | consumed tokens: 3785359360 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.624465E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.848 | TFLOPs: 31.68 | 7: iteration 7230/ 115203 | consumed samples: 1850880 | consumed tokens: 3790602240 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.634650E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.457 | TFLOPs: 31.51 | 7: iteration 7240/ 115203 | consumed samples: 1853440 | consumed tokens: 3795845120 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.643941E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.082 | TFLOPs: 31.38 | 7: iteration 7250/ 115203 | consumed samples: 1856000 | consumed tokens: 3801088000 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.658106E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.200 | TFLOPs: 30.97 | 7: iteration 7260/ 115203 | consumed samples: 1858560 | consumed tokens: 3806330880 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.602590E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.449 | TFLOPs: 32.24 | 7: iteration 7270/ 115203 | consumed samples: 1861120 | consumed tokens: 3811573760 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.642842E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.475 | TFLOPs: 31.61 | 7: iteration 7280/ 115203 | consumed samples: 1863680 | consumed tokens: 3816816640 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.647972E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.534 | TFLOPs: 31.93 | 7: iteration 7290/ 115203 | consumed samples: 1866240 | consumed tokens: 3822059520 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.634047E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.354 | TFLOPs: 31.45 | 7: iteration 7300/ 115203 | consumed samples: 1868800 | consumed tokens: 3827302400 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.642892E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.675 | TFLOPs: 31.41 | 7: iteration 7310/ 115203 | consumed samples: 1871360 | consumed tokens: 3832545280 | elapsed time per iteration (s): 0.44 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.629663E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.156 | TFLOPs: 30.65 | 7: iteration 7320/ 115203 | consumed samples: 1873920 | consumed tokens: 3837788160 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.616665E+00 | grad norm: 0.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.781 | TFLOPs: 30.89 | 7: iteration 7330/ 115203 | consumed samples: 1876480 | consumed tokens: 3843031040 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.631951E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.235 | TFLOPs: 31.81 | 7: iteration 7340/ 115203 | consumed samples: 1879040 | consumed tokens: 3848273920 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.599615E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.222 | TFLOPs: 31.39 | 7: iteration 7350/ 115203 | consumed samples: 1881600 | consumed tokens: 3853516800 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.619855E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.039 | TFLOPs: 31.48 | 7: iteration 7360/ 115203 | consumed samples: 1884160 | consumed tokens: 3858759680 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.638825E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.137 | TFLOPs: 31.86 | 7: iteration 7370/ 115203 | consumed samples: 1886720 | consumed tokens: 3864002560 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.620299E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.209 | TFLOPs: 31.91 | 7: iteration 7380/ 115203 | consumed samples: 1889280 | consumed tokens: 3869245440 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.623810E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.132 | TFLOPs: 30.96 | 7: iteration 7390/ 115203 | consumed samples: 1891840 | consumed tokens: 3874488320 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.592787E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.413 | TFLOPs: 31.35 | 7: iteration 7400/ 115203 | consumed samples: 1894400 | consumed tokens: 3879731200 | elapsed time per iteration (s): 0.42 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.631419E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.564 | TFLOPs: 31.62 | 7: iteration 7410/ 115203 | consumed samples: 1896960 | consumed tokens: 3884974080 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.639350E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.571 | TFLOPs: 31.51 | 7: iteration 7420/ 115203 | consumed samples: 1899520 | consumed tokens: 3890216960 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.589646E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.496 | TFLOPs: 31.40 | 7: iteration 7430/ 115203 | consumed samples: 1902080 | consumed tokens: 3895459840 | elapsed time per iteration (s): 0.43 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.619889E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.072 | TFLOPs: 31.54 | 7: iteration 7440/ 115203 | consumed samples: 1904640 | consumed tokens: 3900702720 | elapsed time per iteration (s): 0.44 | learning rate: 1.987E-04 | global batch size: 256 | lm loss: 2.570894E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.314 | TFLOPs: 30.24 | 7: iteration 7450/ 115203 | consumed samples: 1907200 | consumed tokens: 3905945600 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.618592E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.564 | TFLOPs: 31.77 | 7: iteration 7460/ 115203 | consumed samples: 1909760 | consumed tokens: 3911188480 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.631455E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.019 | TFLOPs: 31.69 | 7: iteration 7470/ 115203 | consumed samples: 1912320 | consumed tokens: 3916431360 | elapsed time per iteration (s): 0.44 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.592615E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.777 | TFLOPs: 30.79 | 7: iteration 7480/ 115203 | consumed samples: 1914880 | consumed tokens: 3921674240 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.593661E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.514 | TFLOPs: 31.72 | 7: iteration 7490/ 115203 | consumed samples: 1917440 | consumed tokens: 3926917120 | elapsed time per iteration (s): 0.44 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.624624E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.060 | TFLOPs: 30.80 | 7: iteration 7500/ 115203 | consumed samples: 1920000 | consumed tokens: 3932160000 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.586448E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.483 | TFLOPs: 31.87 | 7: iteration 7510/ 115203 | consumed samples: 1922560 | consumed tokens: 3937402880 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.594445E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.689 | TFLOPs: 31.78 | 7: iteration 7520/ 115203 | consumed samples: 1925120 | consumed tokens: 3942645760 | elapsed time per iteration (s): 0.44 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.641304E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.578 | TFLOPs: 30.78 | 7: iteration 7530/ 115203 | consumed samples: 1927680 | consumed tokens: 3947888640 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.642187E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.186 | TFLOPs: 31.65 | 7: iteration 7540/ 115203 | consumed samples: 1930240 | consumed tokens: 3953131520 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.625097E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.328 | TFLOPs: 31.71 | 7: iteration 7550/ 115203 | consumed samples: 1932800 | consumed tokens: 3958374400 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.599188E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.230 | TFLOPs: 31.39 | 7: iteration 7560/ 115203 | consumed samples: 1935360 | consumed tokens: 3963617280 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.601456E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.064 | TFLOPs: 31.64 | 7: iteration 7570/ 115203 | consumed samples: 1937920 | consumed tokens: 3968860160 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.613493E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.431 | TFLOPs: 31.29 | 7: iteration 7580/ 115203 | consumed samples: 1940480 | consumed tokens: 3974103040 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.711142E+00 | grad norm: 4.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.421 | TFLOPs: 31.35 | 7: iteration 7590/ 115203 | consumed samples: 1943040 | consumed tokens: 3979345920 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.774139E+00 | grad norm: 0.684 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.083 | TFLOPs: 31.64 | 7: iteration 7600/ 115203 | consumed samples: 1945600 | consumed tokens: 3984588800 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.675845E+00 | grad norm: 0.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.209 | TFLOPs: 31.07 | 7: iteration 7610/ 115203 | consumed samples: 1948160 | consumed tokens: 3989831680 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.693034E+00 | grad norm: 0.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.373 | TFLOPs: 31.92 | 7: iteration 7620/ 115203 | consumed samples: 1950720 | consumed tokens: 3995074560 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.597624E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.077 | TFLOPs: 31.43 | 7: iteration 7630/ 115203 | consumed samples: 1953280 | consumed tokens: 4000317440 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.612417E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.581 | TFLOPs: 31.93 | 7: iteration 7640/ 115203 | consumed samples: 1955840 | consumed tokens: 4005560320 | elapsed time per iteration (s): 0.42 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.638958E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.463 | TFLOPs: 31.77 | 7: iteration 7650/ 115203 | consumed samples: 1958400 | consumed tokens: 4010803200 | elapsed time per iteration (s): 0.44 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.630412E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.906 | TFLOPs: 30.79 | 7: iteration 7660/ 115203 | consumed samples: 1960960 | consumed tokens: 4016046080 | elapsed time per iteration (s): 0.44 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.580757E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.388 | TFLOPs: 30.71 | 7: iteration 7670/ 115203 | consumed samples: 1963520 | consumed tokens: 4021288960 | elapsed time per iteration (s): 0.43 | learning rate: 1.986E-04 | global batch size: 256 | lm loss: 2.615721E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.487 | TFLOPs: 31.51 | 7: iteration 7680/ 115203 | consumed samples: 1966080 | consumed tokens: 4026531840 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.597022E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.377 | TFLOPs: 31.71 | 7: iteration 7690/ 115203 | consumed samples: 1968640 | consumed tokens: 4031774720 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.630846E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.911 | TFLOPs: 30.90 | 7: iteration 7700/ 115203 | consumed samples: 1971200 | consumed tokens: 4037017600 | elapsed time per iteration (s): 0.44 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.608521E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.817 | TFLOPs: 30.74 | 7: iteration 7710/ 115203 | consumed samples: 1973760 | consumed tokens: 4042260480 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.605323E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.866 | TFLOPs: 31.63 | 7: iteration 7720/ 115203 | consumed samples: 1976320 | consumed tokens: 4047503360 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.611587E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.672 | TFLOPs: 31.99 | 7: iteration 7730/ 115203 | consumed samples: 1978880 | consumed tokens: 4052746240 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.637457E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.842 | TFLOPs: 31.94 | 7: iteration 7740/ 115203 | consumed samples: 1981440 | consumed tokens: 4057989120 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.615588E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.243 | TFLOPs: 31.65 | 7: iteration 7750/ 115203 | consumed samples: 1984000 | consumed tokens: 4063232000 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.616535E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.306 | TFLOPs: 31.34 | 7: iteration 7760/ 115203 | consumed samples: 1986560 | consumed tokens: 4068474880 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.621156E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.013 | TFLOPs: 32.01 | 7: iteration 7770/ 115203 | consumed samples: 1989120 | consumed tokens: 4073717760 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.607554E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.820 | TFLOPs: 31.10 | 7: iteration 7780/ 115203 | consumed samples: 1991680 | consumed tokens: 4078960640 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.651298E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.027 | TFLOPs: 30.96 | 7: iteration 7790/ 115203 | consumed samples: 1994240 | consumed tokens: 4084203520 | elapsed time per iteration (s): 0.44 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.602146E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.405 | TFLOPs: 30.51 | 7: iteration 7800/ 115203 | consumed samples: 1996800 | consumed tokens: 4089446400 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.600448E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.385 | TFLOPs: 31.87 | 7: iteration 7810/ 115203 | consumed samples: 1999360 | consumed tokens: 4094689280 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.591483E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.200 | TFLOPs: 31.70 | 7: iteration 7820/ 115203 | consumed samples: 2001920 | consumed tokens: 4099932160 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.643465E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.142 | TFLOPs: 31.49 | 7: iteration 7830/ 115203 | consumed samples: 2004480 | consumed tokens: 4105175040 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.593859E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.909 | TFLOPs: 31.79 | 7: iteration 7840/ 115203 | consumed samples: 2007040 | consumed tokens: 4110417920 | elapsed time per iteration (s): 0.45 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.590660E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.343 | TFLOPs: 29.56 | 7: iteration 7850/ 115203 | consumed samples: 2009600 | consumed tokens: 4115660800 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.632194E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.131 | TFLOPs: 31.44 | 7: iteration 7860/ 115203 | consumed samples: 2012160 | consumed tokens: 4120903680 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.593188E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.570 | TFLOPs: 31.51 | 7: iteration 7870/ 115203 | consumed samples: 2014720 | consumed tokens: 4126146560 | elapsed time per iteration (s): 0.43 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.618248E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.223 | TFLOPs: 31.07 | 7: iteration 7880/ 115203 | consumed samples: 2017280 | consumed tokens: 4131389440 | elapsed time per iteration (s): 0.44 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.597255E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.903 | TFLOPs: 30.64 | 7: iteration 7890/ 115203 | consumed samples: 2019840 | consumed tokens: 4136632320 | elapsed time per iteration (s): 0.42 | learning rate: 1.985E-04 | global batch size: 256 | lm loss: 2.600867E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.472 | TFLOPs: 31.61 | 7: iteration 7900/ 115203 | consumed samples: 2022400 | consumed tokens: 4141875200 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.576686E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.523 | TFLOPs: 30.93 | 7: iteration 7910/ 115203 | consumed samples: 2024960 | consumed tokens: 4147118080 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.625420E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.355 | TFLOPs: 31.18 | 7: iteration 7920/ 115203 | consumed samples: 2027520 | consumed tokens: 4152360960 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.586659E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.983 | TFLOPs: 32.11 | 7: iteration 7930/ 115203 | consumed samples: 2030080 | consumed tokens: 4157603840 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.632969E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.077 | TFLOPs: 31.69 | 7: iteration 7940/ 115203 | consumed samples: 2032640 | consumed tokens: 4162846720 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.570092E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.870 | TFLOPs: 31.68 | 7: iteration 7950/ 115203 | consumed samples: 2035200 | consumed tokens: 4168089600 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.574799E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.029 | TFLOPs: 32.06 | 7: iteration 7960/ 115203 | consumed samples: 2037760 | consumed tokens: 4173332480 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.648321E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.630 | TFLOPs: 31.51 | 7: iteration 7970/ 115203 | consumed samples: 2040320 | consumed tokens: 4178575360 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.610467E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.661 | TFLOPs: 31.73 | 7: iteration 7980/ 115203 | consumed samples: 2042880 | consumed tokens: 4183818240 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.600978E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.199 | TFLOPs: 31.81 | 7: iteration 7990/ 115203 | consumed samples: 2045440 | consumed tokens: 4189061120 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.638500E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.064 | TFLOPs: 31.43 | 0: [2022-11-28 13:52:52,164] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=0, lr=[0.00019840359799331808, 0.00019840359799331808, 0.00019840359799331808], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 8000/ 115203 | consumed samples: 2048000 | consumed tokens: 4194304000 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.600596E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.902 | TFLOPs: 31.69 | 0: steps: 8000 loss: 2.5959 iter time (s): 0.425 samples/sec: 601.814 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 8000 | lm loss value: 2.565096E+00 | lm loss PPL: 1.300191E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 8000 to checkpoints_221m 0: [2022-11-28 13:52:52,323] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step8000 is begin to save! 0: [2022-11-28 13:52:52,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_01-model_00-model_states.pt... 0: [2022-11-28 13:52:52,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_01-model_00-model_states.pt. 0: [2022-11-28 13:52:52,425] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_03-model_00-model_states.pt... 0: [2022-11-28 13:52:52,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_03-model_00-model_states.pt. 0: [2022-11-28 13:52:52,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_04-model_00-model_states.pt... 0: [2022-11-28 13:52:52,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_04-model_00-model_states.pt. 0: [2022-11-28 13:52:52,471] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_05-model_00-model_states.pt... 0: [2022-11-28 13:52:52,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_05-model_00-model_states.pt. 0: [2022-11-28 13:52:52,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_06-model_00-model_states.pt... 0: [2022-11-28 13:52:52,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_06-model_00-model_states.pt. 0: [2022-11-28 13:52:52,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_07-model_00-model_states.pt... 0: [2022-11-28 13:52:52,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_07-model_00-model_states.pt. 0: [2022-11-28 13:52:52,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_08-model_00-model_states.pt... 0: [2022-11-28 13:52:52,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_08-model_00-model_states.pt. 0: [2022-11-28 13:52:52,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_09-model_00-model_states.pt... 0: [2022-11-28 13:52:52,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_09-model_00-model_states.pt. 0: [2022-11-28 13:52:52,585] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_10-model_00-model_states.pt... 0: [2022-11-28 13:52:52,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_10-model_00-model_states.pt. 0: [2022-11-28 13:52:52,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_11-model_00-model_states.pt... 0: [2022-11-28 13:52:52,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_11-model_00-model_states.pt. 0: [2022-11-28 13:52:52,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_12-model_00-model_states.pt... 0: [2022-11-28 13:52:52,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_12-model_00-model_states.pt. 0: [2022-11-28 13:52:52,657] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_13-model_00-model_states.pt... 0: [2022-11-28 13:52:52,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_13-model_00-model_states.pt. 0: [2022-11-28 13:52:52,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_14-model_00-model_states.pt... 0: [2022-11-28 13:52:52,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_14-model_00-model_states.pt. 0: [2022-11-28 13:52:52,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_15-model_00-model_states.pt... 0: [2022-11-28 13:52:52,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_15-model_00-model_states.pt. 0: [2022-11-28 13:52:52,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_16-model_00-model_states.pt... 0: [2022-11-28 13:52:52,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_16-model_00-model_states.pt. 0: [2022-11-28 13:52:52,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_17-model_00-model_states.pt... 0: [2022-11-28 13:52:52,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_17-model_00-model_states.pt. 0: [2022-11-28 13:52:52,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_18-model_00-model_states.pt... 0: [2022-11-28 13:52:52,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_18-model_00-model_states.pt. 0: [2022-11-28 13:52:52,799] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_19-model_00-model_states.pt... 0: [2022-11-28 13:52:52,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_19-model_00-model_states.pt. 0: [2022-11-28 13:52:52,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_20-model_00-model_states.pt... 0: [2022-11-28 13:52:52,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_20-model_00-model_states.pt. 0: [2022-11-28 13:52:52,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/layer_22-model_00-model_states.pt... 0: [2022-11-28 13:52:52,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/layer_22-model_00-model_states.pt. 0: [2022-11-28 13:52:52,851] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step8000/mp_rank_00_model_states.pt 0: [2022-11-28 13:52:52,851] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/mp_rank_00_model_states.pt... 0: [2022-11-28 13:52:52,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/mp_rank_00_model_states.pt. 0: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 4: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 2: [2022-11-28 13:52:52,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 13:52:52,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2022-11-28 13:52:52,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,924] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,925] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,925] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,926] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,926] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,927] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,927] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,929] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,930] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 13:52:52,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,931] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 13:52:52,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 1: [2022-11-28 13:52:52,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,933] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,933] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,933] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 13:52:52,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 2: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,930] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,931] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,937] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,937] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 13:52:52,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 7: [2022-11-28 13:52:52,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,938] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,938] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 4: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 13:52:52,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 13:52:52,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 13:52:52,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 5: [2022-11-28 13:52:52,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 13:52:52,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 13:52:52,945] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 13:52:52,945] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 3: [2022-11-28 13:52:52,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 13:52:52,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 13:52:52,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: [2022-11-28 13:52:52,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step8000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 13:52:52,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step8000 is ready now! 0: successfully saved checkpoint at iteration 8000 to checkpoints_221m 7: time (ms) | save-checkpoint: 662.75 7: iteration 8010/ 115203 | consumed samples: 2050560 | consumed tokens: 4199546880 | elapsed time per iteration (s): 0.52 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.627829E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 492.930 | TFLOPs: 25.86 | 7: iteration 8020/ 115203 | consumed samples: 2053120 | consumed tokens: 4204789760 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.593133E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.209 | TFLOPs: 31.75 | 7: iteration 8030/ 115203 | consumed samples: 2055680 | consumed tokens: 4210032640 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.584263E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.463 | TFLOPs: 31.35 | 7: iteration 8040/ 115203 | consumed samples: 2058240 | consumed tokens: 4215275520 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.591650E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.848 | TFLOPs: 31.26 | 7: iteration 8050/ 115203 | consumed samples: 2060800 | consumed tokens: 4220518400 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.580688E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.833 | TFLOPs: 31.16 | 7: iteration 8060/ 115203 | consumed samples: 2063360 | consumed tokens: 4225761280 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.628475E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.227 | TFLOPs: 31.23 | 7: iteration 8070/ 115203 | consumed samples: 2065920 | consumed tokens: 4231004160 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.585201E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.359 | TFLOPs: 31.45 | 7: iteration 8080/ 115203 | consumed samples: 2068480 | consumed tokens: 4236247040 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.585348E+00 | grad norm: 0.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.926 | TFLOPs: 31.21 | 7: iteration 8090/ 115203 | consumed samples: 2071040 | consumed tokens: 4241489920 | elapsed time per iteration (s): 0.42 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.581990E+00 | grad norm: 0.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.138 | TFLOPs: 31.80 | 7: iteration 8100/ 115203 | consumed samples: 2073600 | consumed tokens: 4246732800 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.608223E+00 | grad norm: 0.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.272 | TFLOPs: 31.02 | 7: iteration 8110/ 115203 | consumed samples: 2076160 | consumed tokens: 4251975680 | elapsed time per iteration (s): 0.43 | learning rate: 1.984E-04 | global batch size: 256 | lm loss: 2.587164E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.569 | TFLOPs: 31.41 | 7: iteration 8120/ 115203 | consumed samples: 2078720 | consumed tokens: 4257218560 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.591288E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.107 | TFLOPs: 31.70 | 7: iteration 8130/ 115203 | consumed samples: 2081280 | consumed tokens: 4262461440 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.587854E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.291 | TFLOPs: 31.18 | 7: iteration 8140/ 115203 | consumed samples: 2083840 | consumed tokens: 4267704320 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.602186E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.873 | TFLOPs: 31.11 | 7: iteration 8150/ 115203 | consumed samples: 2086400 | consumed tokens: 4272947200 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.585324E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.722 | TFLOPs: 31.73 | 7: iteration 8160/ 115203 | consumed samples: 2088960 | consumed tokens: 4278190080 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.607550E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.856 | TFLOPs: 31.42 | 7: iteration 8170/ 115203 | consumed samples: 2091520 | consumed tokens: 4283432960 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.588709E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.380 | TFLOPs: 31.76 | 7: iteration 8180/ 115203 | consumed samples: 2094080 | consumed tokens: 4288675840 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.579838E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.419 | TFLOPs: 31.56 | 7: iteration 8190/ 115203 | consumed samples: 2096640 | consumed tokens: 4293918720 | elapsed time per iteration (s): 0.44 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.613875E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.822 | TFLOPs: 30.47 | 7: iteration 8200/ 115203 | consumed samples: 2099200 | consumed tokens: 4299161600 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.649723E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.459 | TFLOPs: 31.66 | 7: iteration 8210/ 115203 | consumed samples: 2101760 | consumed tokens: 4304404480 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.605800E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.275 | TFLOPs: 31.60 | 7: iteration 8220/ 115203 | consumed samples: 2104320 | consumed tokens: 4309647360 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.593902E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.946 | TFLOPs: 31.43 | 7: iteration 8230/ 115203 | consumed samples: 2106880 | consumed tokens: 4314890240 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.562209E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.715 | TFLOPs: 32.20 | 7: iteration 8240/ 115203 | consumed samples: 2109440 | consumed tokens: 4320133120 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.591747E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.586 | TFLOPs: 31.56 | 7: iteration 8250/ 115203 | consumed samples: 2112000 | consumed tokens: 4325376000 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.621142E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.171 | TFLOPs: 31.91 | 7: iteration 8260/ 115203 | consumed samples: 2114560 | consumed tokens: 4330618880 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.600246E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.130 | TFLOPs: 31.44 | 7: iteration 8270/ 115203 | consumed samples: 2117120 | consumed tokens: 4335861760 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.594542E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.073 | TFLOPs: 31.96 | 7: iteration 8280/ 115203 | consumed samples: 2119680 | consumed tokens: 4341104640 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.588002E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.577 | TFLOPs: 31.62 | 7: iteration 8290/ 115203 | consumed samples: 2122240 | consumed tokens: 4346347520 | elapsed time per iteration (s): 0.42 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.638130E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.262 | TFLOPs: 31.70 | 7: iteration 8300/ 115203 | consumed samples: 2124800 | consumed tokens: 4351590400 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.609499E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.558 | TFLOPs: 31.56 | 7: iteration 8310/ 115203 | consumed samples: 2127360 | consumed tokens: 4356833280 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.584983E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.125 | TFLOPs: 31.23 | 7: iteration 8320/ 115203 | consumed samples: 2129920 | consumed tokens: 4362076160 | elapsed time per iteration (s): 0.43 | learning rate: 1.983E-04 | global batch size: 256 | lm loss: 2.628091E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.183 | TFLOPs: 31.02 | 7: iteration 8330/ 115203 | consumed samples: 2132480 | consumed tokens: 4367319040 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.621895E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.403 | TFLOPs: 31.92 | 7: iteration 8340/ 115203 | consumed samples: 2135040 | consumed tokens: 4372561920 | elapsed time per iteration (s): 0.44 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.623758E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.757 | TFLOPs: 30.79 | 7: iteration 8350/ 115203 | consumed samples: 2137600 | consumed tokens: 4377804800 | elapsed time per iteration (s): 0.44 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.598742E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.963 | TFLOPs: 30.38 | 7: iteration 8360/ 115203 | consumed samples: 2140160 | consumed tokens: 4383047680 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.604394E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.909 | TFLOPs: 31.74 | 7: iteration 8370/ 115203 | consumed samples: 2142720 | consumed tokens: 4388290560 | elapsed time per iteration (s): 0.44 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.559103E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.050 | TFLOPs: 30.80 | 7: iteration 8380/ 115203 | consumed samples: 2145280 | consumed tokens: 4393533440 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.590132E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.750 | TFLOPs: 31.94 | 7: iteration 8390/ 115203 | consumed samples: 2147840 | consumed tokens: 4398776320 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.582435E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.365 | TFLOPs: 31.61 | 7: iteration 8400/ 115203 | consumed samples: 2150400 | consumed tokens: 4404019200 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.561443E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.785 | TFLOPs: 31.00 | 7: iteration 8410/ 115203 | consumed samples: 2152960 | consumed tokens: 4409262080 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.543296E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.123 | TFLOPs: 31.02 | 7: iteration 8420/ 115203 | consumed samples: 2155520 | consumed tokens: 4414504960 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.569620E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.337 | TFLOPs: 31.39 | 7: iteration 8430/ 115203 | consumed samples: 2158080 | consumed tokens: 4419747840 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.601488E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.860 | TFLOPs: 31.58 | 7: iteration 8440/ 115203 | consumed samples: 2160640 | consumed tokens: 4424990720 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.580395E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.115 | TFLOPs: 31.12 | 7: iteration 8450/ 115203 | consumed samples: 2163200 | consumed tokens: 4430233600 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.591906E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.842 | TFLOPs: 31.37 | 7: iteration 8460/ 115203 | consumed samples: 2165760 | consumed tokens: 4435476480 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.614796E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.693 | TFLOPs: 31.73 | 7: iteration 8470/ 115203 | consumed samples: 2168320 | consumed tokens: 4440719360 | elapsed time per iteration (s): 0.43 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.574872E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.055 | TFLOPs: 31.12 | 7: iteration 8480/ 115203 | consumed samples: 2170880 | consumed tokens: 4445962240 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.572326E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.284 | TFLOPs: 32.13 | 7: iteration 8490/ 115203 | consumed samples: 2173440 | consumed tokens: 4451205120 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.602388E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.883 | TFLOPs: 31.68 | 7: iteration 8500/ 115203 | consumed samples: 2176000 | consumed tokens: 4456448000 | elapsed time per iteration (s): 0.45 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.604701E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.511 | TFLOPs: 29.99 | 7: iteration 8510/ 115203 | consumed samples: 2178560 | consumed tokens: 4461690880 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.540196E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.567 | TFLOPs: 31.72 | 7: iteration 8520/ 115203 | consumed samples: 2181120 | consumed tokens: 4466933760 | elapsed time per iteration (s): 0.42 | learning rate: 1.982E-04 | global batch size: 256 | lm loss: 2.563383E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.942 | TFLOPs: 31.79 | 7: iteration 8530/ 115203 | consumed samples: 2183680 | consumed tokens: 4472176640 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.568614E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.687 | TFLOPs: 31.04 | 7: iteration 8540/ 115203 | consumed samples: 2186240 | consumed tokens: 4477419520 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.579284E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.333 | TFLOPs: 31.34 | 7: iteration 8550/ 115203 | consumed samples: 2188800 | consumed tokens: 4482662400 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.576623E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.342 | TFLOPs: 31.76 | 7: iteration 8560/ 115203 | consumed samples: 2191360 | consumed tokens: 4487905280 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.586117E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.705 | TFLOPs: 31.10 | 7: iteration 8570/ 115203 | consumed samples: 2193920 | consumed tokens: 4493148160 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.582148E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.388 | TFLOPs: 31.45 | 7: iteration 8580/ 115203 | consumed samples: 2196480 | consumed tokens: 4498391040 | elapsed time per iteration (s): 0.45 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.585339E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.011 | TFLOPs: 29.59 | 7: iteration 8590/ 115203 | consumed samples: 2199040 | consumed tokens: 4503633920 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.613641E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.238 | TFLOPs: 31.44 | 7: iteration 8600/ 115203 | consumed samples: 2201600 | consumed tokens: 4508876800 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.601910E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.085 | TFLOPs: 31.64 | 7: iteration 8610/ 115203 | consumed samples: 2204160 | consumed tokens: 4514119680 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.573774E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.347 | TFLOPs: 31.97 | 7: iteration 8620/ 115203 | consumed samples: 2206720 | consumed tokens: 4519362560 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.612856E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.945 | TFLOPs: 31.79 | 7: iteration 8630/ 115203 | consumed samples: 2209280 | consumed tokens: 4524605440 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.565764E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.297 | TFLOPs: 31.76 | 7: iteration 8640/ 115203 | consumed samples: 2211840 | consumed tokens: 4529848320 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.560257E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.136 | TFLOPs: 31.44 | 7: iteration 8650/ 115203 | consumed samples: 2214400 | consumed tokens: 4535091200 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.550850E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.316 | TFLOPs: 31.34 | 7: iteration 8660/ 115203 | consumed samples: 2216960 | consumed tokens: 4540334080 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.585141E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.059 | TFLOPs: 31.59 | 7: iteration 8670/ 115203 | consumed samples: 2219520 | consumed tokens: 4545576960 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.575876E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.039 | TFLOPs: 30.91 | 7: iteration 8680/ 115203 | consumed samples: 2222080 | consumed tokens: 4550819840 | elapsed time per iteration (s): 0.43 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.575314E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.245 | TFLOPs: 31.55 | 7: iteration 8690/ 115203 | consumed samples: 2224640 | consumed tokens: 4556062720 | elapsed time per iteration (s): 0.42 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.562571E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.346 | TFLOPs: 32.18 | 7: iteration 8700/ 115203 | consumed samples: 2227200 | consumed tokens: 4561305600 | elapsed time per iteration (s): 0.44 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.572719E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.133 | TFLOPs: 30.44 | 7: iteration 8710/ 115203 | consumed samples: 2229760 | consumed tokens: 4566548480 | elapsed time per iteration (s): 0.45 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.555964E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.280 | TFLOPs: 29.82 | 7: iteration 8720/ 115203 | consumed samples: 2232320 | consumed tokens: 4571791360 | elapsed time per iteration (s): 0.46 | learning rate: 1.981E-04 | global batch size: 256 | lm loss: 2.555496E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.729 | TFLOPs: 29.05 | 7: iteration 8730/ 115203 | consumed samples: 2234880 | consumed tokens: 4577034240 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.601226E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.684 | TFLOPs: 30.78 | 7: iteration 8740/ 115203 | consumed samples: 2237440 | consumed tokens: 4582277120 | elapsed time per iteration (s): 0.43 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.545036E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.232 | TFLOPs: 31.44 | 7: iteration 8750/ 115203 | consumed samples: 2240000 | consumed tokens: 4587520000 | elapsed time per iteration (s): 0.43 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.589722E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.438 | TFLOPs: 31.24 | 7: iteration 8760/ 115203 | consumed samples: 2242560 | consumed tokens: 4592762880 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.580256E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.394 | TFLOPs: 31.71 | 7: iteration 8770/ 115203 | consumed samples: 2245120 | consumed tokens: 4598005760 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.548474E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.986 | TFLOPs: 31.64 | 7: iteration 8780/ 115203 | consumed samples: 2247680 | consumed tokens: 4603248640 | elapsed time per iteration (s): 0.43 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.566089E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.447 | TFLOPs: 31.50 | 7: iteration 8790/ 115203 | consumed samples: 2250240 | consumed tokens: 4608491520 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.557570E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.675 | TFLOPs: 30.52 | 7: iteration 8800/ 115203 | consumed samples: 2252800 | consumed tokens: 4613734400 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.581624E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.915 | TFLOPs: 32.11 | 7: iteration 8810/ 115203 | consumed samples: 2255360 | consumed tokens: 4618977280 | elapsed time per iteration (s): 0.42 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.559826E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.016 | TFLOPs: 31.90 | 7: iteration 8820/ 115203 | consumed samples: 2257920 | consumed tokens: 4624220160 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.581110E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.389 | TFLOPs: 30.66 | 7: iteration 8830/ 115203 | consumed samples: 2260480 | consumed tokens: 4629463040 | elapsed time per iteration (s): 0.43 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.580588E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.131 | TFLOPs: 31.59 | 7: iteration 8840/ 115203 | consumed samples: 2263040 | consumed tokens: 4634705920 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.587865E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.522 | TFLOPs: 30.67 | 7: iteration 8850/ 115203 | consumed samples: 2265600 | consumed tokens: 4639948800 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.567590E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.773 | TFLOPs: 30.84 | 7: iteration 8860/ 115203 | consumed samples: 2268160 | consumed tokens: 4645191680 | elapsed time per iteration (s): 0.45 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.579373E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.405 | TFLOPs: 30.03 | 7: iteration 8870/ 115203 | consumed samples: 2270720 | consumed tokens: 4650434560 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.555746E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.018 | TFLOPs: 30.85 | 7: iteration 8880/ 115203 | consumed samples: 2273280 | consumed tokens: 4655677440 | elapsed time per iteration (s): 0.46 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.572238E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.696 | TFLOPs: 28.89 | 7: iteration 8890/ 115203 | consumed samples: 2275840 | consumed tokens: 4660920320 | elapsed time per iteration (s): 0.44 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.546478E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.621 | TFLOPs: 30.20 | 7: iteration 8900/ 115203 | consumed samples: 2278400 | consumed tokens: 4666163200 | elapsed time per iteration (s): 0.58 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.579620E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 441.491 | TFLOPs: 23.16 | 7: iteration 8910/ 115203 | consumed samples: 2280960 | consumed tokens: 4671406080 | elapsed time per iteration (s): 0.43 | learning rate: 1.980E-04 | global batch size: 256 | lm loss: 2.571844E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.057 | TFLOPs: 31.48 | 7: iteration 8920/ 115203 | consumed samples: 2283520 | consumed tokens: 4676648960 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.549047E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.564 | TFLOPs: 31.67 | 7: iteration 8930/ 115203 | consumed samples: 2286080 | consumed tokens: 4681891840 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.558686E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.714 | TFLOPs: 31.05 | 7: iteration 8940/ 115203 | consumed samples: 2288640 | consumed tokens: 4687134720 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.559026E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.053 | TFLOPs: 31.69 | 7: iteration 8950/ 115203 | consumed samples: 2291200 | consumed tokens: 4692377600 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.569513E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.544 | TFLOPs: 31.09 | 7: iteration 8960/ 115203 | consumed samples: 2293760 | consumed tokens: 4697620480 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.560859E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.888 | TFLOPs: 31.84 | 7: iteration 8970/ 115203 | consumed samples: 2296320 | consumed tokens: 4702863360 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.567938E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.574 | TFLOPs: 31.30 | 7: iteration 8980/ 115203 | consumed samples: 2298880 | consumed tokens: 4708106240 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.548491E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.562 | TFLOPs: 31.88 | 7: iteration 8990/ 115203 | consumed samples: 2301440 | consumed tokens: 4713349120 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.553283E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.003 | TFLOPs: 31.64 | 7: iteration 9000/ 115203 | consumed samples: 2304000 | consumed tokens: 4718592000 | elapsed time per iteration (s): 0.45 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.560385E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.693 | TFLOPs: 30.05 | 7: ------------------------------------------------------------------------------------------ 7: valid loss at iteration 9000 | lm loss value: 2.430614E+00 | lm loss PPL: 1.136585E+01 | 7: ------------------------------------------------------------------------------------------ 0: saving checkpoint at iteration 9000 to checkpoints_221m 0: [2022-11-28 14:00:04,318] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step9000 is begin to save! 0: [2022-11-28 14:00:04,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:00:04,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:00:04,515] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:00:04,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:00:04,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:00:04,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:00:04,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:00:04,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:00:04,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:00:04,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:00:04,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:00:04,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:00:04,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:00:04,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:00:04,711] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:00:04,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:00:04,744] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:00:04,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:00:04,777] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:00:04,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:00:04,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:00:04,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:00:04,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:00:04,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:00:04,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:00:04,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:00:04,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:00:04,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:00:04,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:00:04,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:00:04,973] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:00:05,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:00:05,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:00:05,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:00:05,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:00:05,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:00:05,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:00:05,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:00:05,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:00:05,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:00:05,112] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step9000/mp_rank_00_model_states.pt 0: [2022-11-28 14:00:05,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:00:05,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:00:05,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,192] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 2: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:00:05,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:00:05,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 6: [2022-11-28 14:00:05,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:00:05,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:00:05,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:00:05,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 14:00:05,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 3: [2022-11-28 14:00:05,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:00:05,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:00:05,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:00:05,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 7: [2022-11-28 14:00:05,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:00:05,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 4: [2022-11-28 14:00:05,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: [2022-11-28 14:00:05,270] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step9000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:00:05,270] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step9000 is ready now! 0: successfully saved checkpoint at iteration 9000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1010.85 7: iteration 9010/ 115203 | consumed samples: 2306560 | consumed tokens: 4723834880 | elapsed time per iteration (s): 0.54 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.580613E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 469.942 | TFLOPs: 24.66 | 7: iteration 9020/ 115203 | consumed samples: 2309120 | consumed tokens: 4729077760 | elapsed time per iteration (s): 0.44 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.554435E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.354 | TFLOPs: 30.87 | 7: iteration 9030/ 115203 | consumed samples: 2311680 | consumed tokens: 4734320640 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.587484E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.929 | TFLOPs: 32.11 | 7: iteration 9040/ 115203 | consumed samples: 2314240 | consumed tokens: 4739563520 | elapsed time per iteration (s): 0.44 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.576418E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.180 | TFLOPs: 30.49 | 7: iteration 9050/ 115203 | consumed samples: 2316800 | consumed tokens: 4744806400 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.593025E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.940 | TFLOPs: 31.85 | 7: iteration 9060/ 115203 | consumed samples: 2319360 | consumed tokens: 4750049280 | elapsed time per iteration (s): 0.45 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.586411E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.744 | TFLOPs: 29.95 | 7: iteration 9070/ 115203 | consumed samples: 2321920 | consumed tokens: 4755292160 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.569723E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.718 | TFLOPs: 31.36 | 7: iteration 9080/ 115203 | consumed samples: 2324480 | consumed tokens: 4760535040 | elapsed time per iteration (s): 0.43 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.597976E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.449 | TFLOPs: 31.50 | 7: iteration 9090/ 115203 | consumed samples: 2327040 | consumed tokens: 4765777920 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.553673E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.386 | TFLOPs: 31.76 | 7: iteration 9100/ 115203 | consumed samples: 2329600 | consumed tokens: 4771020800 | elapsed time per iteration (s): 0.42 | learning rate: 1.979E-04 | global batch size: 256 | lm loss: 2.583516E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.313 | TFLOPs: 31.92 | 7: iteration 9110/ 115203 | consumed samples: 2332160 | consumed tokens: 4776263680 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.547017E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.284 | TFLOPs: 31.81 | 7: iteration 9120/ 115203 | consumed samples: 2334720 | consumed tokens: 4781506560 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.576744E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.978 | TFLOPs: 31.48 | 7: iteration 9130/ 115203 | consumed samples: 2337280 | consumed tokens: 4786749440 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.569745E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.328 | TFLOPs: 31.29 | 7: iteration 9140/ 115203 | consumed samples: 2339840 | consumed tokens: 4791992320 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.556358E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.308 | TFLOPs: 31.34 | 7: iteration 9150/ 115203 | consumed samples: 2342400 | consumed tokens: 4797235200 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.573703E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.373 | TFLOPs: 31.29 | 7: iteration 9160/ 115203 | consumed samples: 2344960 | consumed tokens: 4802478080 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.580726E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.644 | TFLOPs: 31.41 | 7: iteration 9170/ 115203 | consumed samples: 2347520 | consumed tokens: 4807720960 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.620682E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.873 | TFLOPs: 32.26 | 7: iteration 9180/ 115203 | consumed samples: 2350080 | consumed tokens: 4812963840 | elapsed time per iteration (s): 0.44 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.574541E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.273 | TFLOPs: 30.60 | 7: iteration 9190/ 115203 | consumed samples: 2352640 | consumed tokens: 4818206720 | elapsed time per iteration (s): 0.44 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.577007E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.101 | TFLOPs: 30.70 | 7: iteration 9200/ 115203 | consumed samples: 2355200 | consumed tokens: 4823449600 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.560983E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.818 | TFLOPs: 31.84 | 7: iteration 9210/ 115203 | consumed samples: 2357760 | consumed tokens: 4828692480 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.557169E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.960 | TFLOPs: 31.74 | 7: iteration 9220/ 115203 | consumed samples: 2360320 | consumed tokens: 4833935360 | elapsed time per iteration (s): 0.44 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.515407E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.207 | TFLOPs: 30.65 | 7: iteration 9230/ 115203 | consumed samples: 2362880 | consumed tokens: 4839178240 | elapsed time per iteration (s): 0.43 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.578344E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.293 | TFLOPs: 31.13 | 7: iteration 9240/ 115203 | consumed samples: 2365440 | consumed tokens: 4844421120 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.561917E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.073 | TFLOPs: 31.96 | 7: iteration 9250/ 115203 | consumed samples: 2368000 | consumed tokens: 4849664000 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.559540E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.104 | TFLOPs: 32.01 | 7: iteration 9260/ 115203 | consumed samples: 2370560 | consumed tokens: 4854906880 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.592586E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.100 | TFLOPs: 32.22 | 7: iteration 9270/ 115203 | consumed samples: 2373120 | consumed tokens: 4860149760 | elapsed time per iteration (s): 0.42 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.579008E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.178 | TFLOPs: 31.91 | 7: iteration 9280/ 115203 | consumed samples: 2375680 | consumed tokens: 4865392640 | elapsed time per iteration (s): 0.44 | learning rate: 1.978E-04 | global batch size: 256 | lm loss: 2.579620E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.038 | TFLOPs: 30.70 | 7: iteration 9290/ 115203 | consumed samples: 2378240 | consumed tokens: 4870635520 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.564101E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.195 | TFLOPs: 31.96 | 7: iteration 9300/ 115203 | consumed samples: 2380800 | consumed tokens: 4875878400 | elapsed time per iteration (s): 0.43 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.546251E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.652 | TFLOPs: 31.25 | 7: iteration 9310/ 115203 | consumed samples: 2383360 | consumed tokens: 4881121280 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.567491E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.747 | TFLOPs: 31.73 | 7: iteration 9320/ 115203 | consumed samples: 2385920 | consumed tokens: 4886364160 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.565690E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.763 | TFLOPs: 31.89 | 7: iteration 9330/ 115203 | consumed samples: 2388480 | consumed tokens: 4891607040 | elapsed time per iteration (s): 0.43 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.594282E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.323 | TFLOPs: 31.08 | 7: iteration 9340/ 115203 | consumed samples: 2391040 | consumed tokens: 4896849920 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.574619E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.454 | TFLOPs: 31.71 | 7: iteration 9350/ 115203 | consumed samples: 2393600 | consumed tokens: 4902092800 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.577289E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.515 | TFLOPs: 31.61 | 7: iteration 9360/ 115203 | consumed samples: 2396160 | consumed tokens: 4907335680 | elapsed time per iteration (s): 0.43 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.531676E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.692 | TFLOPs: 31.57 | 7: iteration 9370/ 115203 | consumed samples: 2398720 | consumed tokens: 4912578560 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.559075E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.357 | TFLOPs: 31.76 | 7: iteration 9380/ 115203 | consumed samples: 2401280 | consumed tokens: 4917821440 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.574300E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.344 | TFLOPs: 31.76 | 7: iteration 9390/ 115203 | consumed samples: 2403840 | consumed tokens: 4923064320 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.542531E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.034 | TFLOPs: 31.64 | 7: iteration 9400/ 115203 | consumed samples: 2406400 | consumed tokens: 4928307200 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.571681E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.951 | TFLOPs: 31.95 | 7: iteration 9410/ 115203 | consumed samples: 2408960 | consumed tokens: 4933550080 | elapsed time per iteration (s): 0.45 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.519189E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.384 | TFLOPs: 29.98 | 7: iteration 9420/ 115203 | consumed samples: 2411520 | consumed tokens: 4938792960 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.581834E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.113 | TFLOPs: 32.01 | 7: iteration 9430/ 115203 | consumed samples: 2414080 | consumed tokens: 4944035840 | elapsed time per iteration (s): 0.43 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.547447E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.284 | TFLOPs: 31.50 | 7: iteration 9440/ 115203 | consumed samples: 2416640 | consumed tokens: 4949278720 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.572669E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.813 | TFLOPs: 31.68 | 7: iteration 9450/ 115203 | consumed samples: 2419200 | consumed tokens: 4954521600 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.608457E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.976 | TFLOPs: 31.69 | 7: iteration 9460/ 115203 | consumed samples: 2421760 | consumed tokens: 4959764480 | elapsed time per iteration (s): 0.42 | learning rate: 1.977E-04 | global batch size: 256 | lm loss: 2.580007E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.176 | TFLOPs: 31.81 | 7: iteration 9470/ 115203 | consumed samples: 2424320 | consumed tokens: 4965007360 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.582866E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.402 | TFLOPs: 32.13 | 7: iteration 9480/ 115203 | consumed samples: 2426880 | consumed tokens: 4970250240 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.531208E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.369 | TFLOPs: 31.66 | 7: iteration 9490/ 115203 | consumed samples: 2429440 | consumed tokens: 4975493120 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.534360E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.025 | TFLOPs: 32.16 | 7: iteration 9500/ 115203 | consumed samples: 2432000 | consumed tokens: 4980736000 | elapsed time per iteration (s): 0.43 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.566751E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.188 | TFLOPs: 31.39 | 7: iteration 9510/ 115203 | consumed samples: 2434560 | consumed tokens: 4985978880 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.564361E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.913 | TFLOPs: 31.79 | 7: iteration 9520/ 115203 | consumed samples: 2437120 | consumed tokens: 4991221760 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.543437E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.857 | TFLOPs: 32.10 | 7: iteration 9530/ 115203 | consumed samples: 2439680 | consumed tokens: 4996464640 | elapsed time per iteration (s): 0.43 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.583511E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.976 | TFLOPs: 31.37 | 7: iteration 9540/ 115203 | consumed samples: 2442240 | consumed tokens: 5001707520 | elapsed time per iteration (s): 0.44 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.581760E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.222 | TFLOPs: 30.39 | 7: iteration 9550/ 115203 | consumed samples: 2444800 | consumed tokens: 5006950400 | elapsed time per iteration (s): 0.43 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.573633E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.538 | TFLOPs: 31.51 | 7: iteration 9560/ 115203 | consumed samples: 2447360 | consumed tokens: 5012193280 | elapsed time per iteration (s): 0.43 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.568332E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.160 | TFLOPs: 31.44 | 7: iteration 9570/ 115203 | consumed samples: 2449920 | consumed tokens: 5017436160 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.500181E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.853 | TFLOPs: 31.79 | 7: iteration 9580/ 115203 | consumed samples: 2452480 | consumed tokens: 5022679040 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.564264E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.448 | TFLOPs: 31.61 | 7: iteration 9590/ 115203 | consumed samples: 2455040 | consumed tokens: 5027921920 | elapsed time per iteration (s): 0.44 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.532949E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.677 | TFLOPs: 30.78 | 7: iteration 9600/ 115203 | consumed samples: 2457600 | consumed tokens: 5033164800 | elapsed time per iteration (s): 0.43 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.540740E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.038 | TFLOPs: 31.54 | 7: iteration 9610/ 115203 | consumed samples: 2460160 | consumed tokens: 5038407680 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.537673E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.763 | TFLOPs: 31.63 | 7: iteration 9620/ 115203 | consumed samples: 2462720 | consumed tokens: 5043650560 | elapsed time per iteration (s): 0.44 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.600327E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.075 | TFLOPs: 30.86 | 7: iteration 9630/ 115203 | consumed samples: 2465280 | consumed tokens: 5048893440 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.572302E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.423 | TFLOPs: 31.98 | 7: iteration 9640/ 115203 | consumed samples: 2467840 | consumed tokens: 5054136320 | elapsed time per iteration (s): 0.42 | learning rate: 1.976E-04 | global batch size: 256 | lm loss: 2.527544E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.987 | TFLOPs: 31.85 | 7: iteration 9650/ 115203 | consumed samples: 2470400 | consumed tokens: 5059379200 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.567594E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.253 | TFLOPs: 31.60 | 7: iteration 9660/ 115203 | consumed samples: 2472960 | consumed tokens: 5064622080 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.571152E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.374 | TFLOPs: 31.87 | 7: iteration 9670/ 115203 | consumed samples: 2475520 | consumed tokens: 5069864960 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.540587E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.590 | TFLOPs: 31.41 | 7: iteration 9680/ 115203 | consumed samples: 2478080 | consumed tokens: 5075107840 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.543616E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.307 | TFLOPs: 31.71 | 7: iteration 9690/ 115203 | consumed samples: 2480640 | consumed tokens: 5080350720 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.544290E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.310 | TFLOPs: 31.29 | 7: iteration 9700/ 115203 | consumed samples: 2483200 | consumed tokens: 5085593600 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.554142E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.821 | TFLOPs: 31.84 | 7: iteration 9710/ 115203 | consumed samples: 2485760 | consumed tokens: 5090836480 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.546855E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.470 | TFLOPs: 31.35 | 7: iteration 9720/ 115203 | consumed samples: 2488320 | consumed tokens: 5096079360 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.551566E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.942 | TFLOPs: 31.43 | 7: iteration 9730/ 115203 | consumed samples: 2490880 | consumed tokens: 5101322240 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.602100E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.181 | TFLOPs: 31.75 | 7: iteration 9740/ 115203 | consumed samples: 2493440 | consumed tokens: 5106565120 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.569875E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.223 | TFLOPs: 31.96 | 7: iteration 9750/ 115203 | consumed samples: 2496000 | consumed tokens: 5111808000 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.545795E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.150 | TFLOPs: 31.86 | 7: iteration 9760/ 115203 | consumed samples: 2498560 | consumed tokens: 5117050880 | elapsed time per iteration (s): 0.43 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.541306E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.775 | TFLOPs: 31.15 | 7: iteration 9770/ 115203 | consumed samples: 2501120 | consumed tokens: 5122293760 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.555944E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.453 | TFLOPs: 31.87 | 7: iteration 9780/ 115203 | consumed samples: 2503680 | consumed tokens: 5127536640 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.581886E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.737 | TFLOPs: 31.78 | 7: iteration 9790/ 115203 | consumed samples: 2506240 | consumed tokens: 5132779520 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.538123E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.452 | TFLOPs: 32.03 | 7: iteration 9800/ 115203 | consumed samples: 2508800 | consumed tokens: 5138022400 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.566957E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.121 | TFLOPs: 31.75 | 7: iteration 9810/ 115203 | consumed samples: 2511360 | consumed tokens: 5143265280 | elapsed time per iteration (s): 0.42 | learning rate: 1.975E-04 | global batch size: 256 | lm loss: 2.562114E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.160 | TFLOPs: 31.91 | 7: iteration 9820/ 115203 | consumed samples: 2513920 | consumed tokens: 5148508160 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.534163E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.473 | TFLOPs: 31.51 | 7: iteration 9830/ 115203 | consumed samples: 2516480 | consumed tokens: 5153751040 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.561876E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.039 | TFLOPs: 31.12 | 7: iteration 9840/ 115203 | consumed samples: 2519040 | consumed tokens: 5158993920 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.515751E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.454 | TFLOPs: 31.09 | 7: iteration 9850/ 115203 | consumed samples: 2521600 | consumed tokens: 5164236800 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.560490E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.563 | TFLOPs: 31.25 | 7: iteration 9860/ 115203 | consumed samples: 2524160 | consumed tokens: 5169479680 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.576631E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.012 | TFLOPs: 31.22 | 7: iteration 9870/ 115203 | consumed samples: 2526720 | consumed tokens: 5174722560 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.518717E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.912 | TFLOPs: 31.69 | 7: iteration 9880/ 115203 | consumed samples: 2529280 | consumed tokens: 5179965440 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.548287E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.309 | TFLOPs: 31.81 | 7: iteration 9890/ 115203 | consumed samples: 2531840 | consumed tokens: 5185208320 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.582479E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.129 | TFLOPs: 31.86 | 7: iteration 9900/ 115203 | consumed samples: 2534400 | consumed tokens: 5190451200 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.579995E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.964 | TFLOPs: 31.95 | 7: iteration 9910/ 115203 | consumed samples: 2536960 | consumed tokens: 5195694080 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.561670E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.034 | TFLOPs: 31.96 | 7: iteration 9920/ 115203 | consumed samples: 2539520 | consumed tokens: 5200936960 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.587742E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.987 | TFLOPs: 31.43 | 7: iteration 9930/ 115203 | consumed samples: 2542080 | consumed tokens: 5206179840 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.560300E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.613 | TFLOPs: 31.30 | 7: iteration 9940/ 115203 | consumed samples: 2544640 | consumed tokens: 5211422720 | elapsed time per iteration (s): 0.44 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.534745E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.127 | TFLOPs: 30.60 | 7: iteration 9950/ 115203 | consumed samples: 2547200 | consumed tokens: 5216665600 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.530109E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.002 | TFLOPs: 31.22 | 7: iteration 9960/ 115203 | consumed samples: 2549760 | consumed tokens: 5221908480 | elapsed time per iteration (s): 0.42 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.563416E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.379 | TFLOPs: 31.71 | 7: iteration 9970/ 115203 | consumed samples: 2552320 | consumed tokens: 5227151360 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.521412E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.297 | TFLOPs: 31.55 | 7: iteration 9980/ 115203 | consumed samples: 2554880 | consumed tokens: 5232394240 | elapsed time per iteration (s): 0.43 | learning rate: 1.974E-04 | global batch size: 256 | lm loss: 2.571495E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.259 | TFLOPs: 31.55 | 7: iteration 9990/ 115203 | consumed samples: 2557440 | consumed tokens: 5237637120 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.570873E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.066 | TFLOPs: 31.75 | 0: [2022-11-28 14:07:11,430] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=0, lr=[0.00019734023411853413, 0.00019734023411853413, 0.00019734023411853413], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 10000/ 115203 | consumed samples: 2560000 | consumed tokens: 5242880000 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.574615E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.485 | TFLOPs: 31.45 | 0: steps: 10000 loss: 2.5440 iter time (s): 0.425 samples/sec: 601.704 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 10000 | lm loss value: 2.461508E+00 | lm loss PPL: 1.172247E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 10000 to checkpoints_221m 0: [2022-11-28 14:07:11,598] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step10000 is begin to save! 0: [2022-11-28 14:07:11,601] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:07:11,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:07:11,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:07:11,759] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:07:11,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:07:11,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:07:11,783] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:07:11,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:07:11,806] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:07:11,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:07:11,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:07:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:07:11,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:07:11,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:07:11,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:07:11,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:07:11,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:07:11,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:07:11,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:07:11,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:07:11,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:07:11,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:07:11,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:07:11,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:07:11,998] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:07:12,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:07:12,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:07:12,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:07:12,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:07:12,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:07:12,071] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:07:12,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:07:12,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:07:12,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:07:12,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:07:12,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:07:12,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:07:12,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:07:12,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:07:12,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:07:12,172] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step10000/mp_rank_00_model_states.pt 0: [2022-11-28 14:07:12,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:07:12,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:07:12,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,246] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,246] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,244] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,243] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,244] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,243] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 5: [2022-11-28 14:07:12,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:07:12,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 14:07:12,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:07:12,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:07:12,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 7: [2022-11-28 14:07:12,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 2: [2022-11-28 14:07:12,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:07:12,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:07:12,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,256] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 6: [2022-11-28 14:07:12,256] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:07:12,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,258] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 3: [2022-11-28 14:07:12,258] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:07:12,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,260] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,260] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 4: [2022-11-28 14:07:12,261] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,266] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,266] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:07:12,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,269] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 1: [2022-11-28 14:07:12,269] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: [2022-11-28 14:07:12,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step10000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:07:12,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step10000 is ready now! 0: successfully saved checkpoint at iteration 10000 to checkpoints_221m 7: time (ms) | save-checkpoint: 726.51 7: iteration 10010/ 115203 | consumed samples: 2562560 | consumed tokens: 5248122880 | elapsed time per iteration (s): 0.51 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.544514E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.347 | TFLOPs: 26.20 | 7: iteration 10020/ 115203 | consumed samples: 2565120 | consumed tokens: 5253365760 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.570995E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.511 | TFLOPs: 31.77 | 7: iteration 10030/ 115203 | consumed samples: 2567680 | consumed tokens: 5258608640 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.571991E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.100 | TFLOPs: 31.38 | 7: iteration 10040/ 115203 | consumed samples: 2570240 | consumed tokens: 5263851520 | elapsed time per iteration (s): 0.42 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.571577E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.124 | TFLOPs: 31.80 | 7: iteration 10050/ 115203 | consumed samples: 2572800 | consumed tokens: 5269094400 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.549247E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.996 | TFLOPs: 31.53 | 7: iteration 10060/ 115203 | consumed samples: 2575360 | consumed tokens: 5274337280 | elapsed time per iteration (s): 0.60 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.554412E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 424.999 | TFLOPs: 22.30 | 7: iteration 10070/ 115203 | consumed samples: 2577920 | consumed tokens: 5279580160 | elapsed time per iteration (s): 0.50 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.555666E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 513.012 | TFLOPs: 26.92 | 7: iteration 10080/ 115203 | consumed samples: 2580480 | consumed tokens: 5284823040 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.517504E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.535 | TFLOPs: 31.30 | 7: iteration 10090/ 115203 | consumed samples: 2583040 | consumed tokens: 5290065920 | elapsed time per iteration (s): 0.44 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.530790E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.004 | TFLOPs: 30.33 | 7: iteration 10100/ 115203 | consumed samples: 2585600 | consumed tokens: 5295308800 | elapsed time per iteration (s): 0.44 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.561987E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.291 | TFLOPs: 30.66 | 7: iteration 10110/ 115203 | consumed samples: 2588160 | consumed tokens: 5300551680 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.535959E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.690 | TFLOPs: 30.89 | 7: iteration 10120/ 115203 | consumed samples: 2590720 | consumed tokens: 5305794560 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.520988E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.945 | TFLOPs: 31.16 | 7: iteration 10130/ 115203 | consumed samples: 2593280 | consumed tokens: 5311037440 | elapsed time per iteration (s): 0.43 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.546319E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.857 | TFLOPs: 31.37 | 7: iteration 10140/ 115203 | consumed samples: 2595840 | consumed tokens: 5316280320 | elapsed time per iteration (s): 0.44 | learning rate: 1.973E-04 | global batch size: 256 | lm loss: 2.558502E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.143 | TFLOPs: 30.75 | 7: iteration 10150/ 115203 | consumed samples: 2598400 | consumed tokens: 5321523200 | elapsed time per iteration (s): 0.44 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.527556E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.416 | TFLOPs: 30.87 | 7: iteration 10160/ 115203 | consumed samples: 2600960 | consumed tokens: 5326766080 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.545945E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.241 | TFLOPs: 31.23 | 7: iteration 10170/ 115203 | consumed samples: 2603520 | consumed tokens: 5332008960 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.569411E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.457 | TFLOPs: 31.40 | 7: iteration 10180/ 115203 | consumed samples: 2606080 | consumed tokens: 5337251840 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.542225E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.276 | TFLOPs: 31.34 | 7: iteration 10190/ 115203 | consumed samples: 2608640 | consumed tokens: 5342494720 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.555724E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.561 | TFLOPs: 31.41 | 7: iteration 10200/ 115203 | consumed samples: 2611200 | consumed tokens: 5347737600 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.541698E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.668 | TFLOPs: 31.31 | 7: iteration 10210/ 115203 | consumed samples: 2613760 | consumed tokens: 5352980480 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.544709E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.597 | TFLOPs: 31.41 | 7: iteration 10220/ 115203 | consumed samples: 2616320 | consumed tokens: 5358223360 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.524815E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.900 | TFLOPs: 31.48 | 7: iteration 10230/ 115203 | consumed samples: 2618880 | consumed tokens: 5363466240 | elapsed time per iteration (s): 0.44 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.542518E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.766 | TFLOPs: 30.21 | 7: iteration 10240/ 115203 | consumed samples: 2621440 | consumed tokens: 5368709120 | elapsed time per iteration (s): 0.43 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.559492E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.876 | TFLOPs: 31.16 | 7: iteration 10250/ 115203 | consumed samples: 2624000 | consumed tokens: 5373952000 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.547790E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.451 | TFLOPs: 32.08 | 7: iteration 10260/ 115203 | consumed samples: 2626560 | consumed tokens: 5379194880 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.519167E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.890 | TFLOPs: 31.79 | 7: iteration 10270/ 115203 | consumed samples: 2629120 | consumed tokens: 5384437760 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.546176E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.451 | TFLOPs: 32.08 | 7: iteration 10280/ 115203 | consumed samples: 2631680 | consumed tokens: 5389680640 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.553724E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.063 | TFLOPs: 32.01 | 7: iteration 10290/ 115203 | consumed samples: 2634240 | consumed tokens: 5394923520 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.567303E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.451 | TFLOPs: 31.66 | 7: iteration 10300/ 115203 | consumed samples: 2636800 | consumed tokens: 5400166400 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.520327E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.694 | TFLOPs: 31.94 | 7: iteration 10310/ 115203 | consumed samples: 2639360 | consumed tokens: 5405409280 | elapsed time per iteration (s): 0.42 | learning rate: 1.972E-04 | global batch size: 256 | lm loss: 2.552629E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.535 | TFLOPs: 31.61 | 7: iteration 10320/ 115203 | consumed samples: 2641920 | consumed tokens: 5410652160 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.574174E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.382 | TFLOPs: 31.92 | 7: iteration 10330/ 115203 | consumed samples: 2644480 | consumed tokens: 5415895040 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.530440E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.876 | TFLOPs: 32.16 | 7: iteration 10340/ 115203 | consumed samples: 2647040 | consumed tokens: 5421137920 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.574258E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.021 | TFLOPs: 31.85 | 7: iteration 10350/ 115203 | consumed samples: 2649600 | consumed tokens: 5426380800 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.545794E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.586 | TFLOPs: 31.83 | 7: iteration 10360/ 115203 | consumed samples: 2652160 | consumed tokens: 5431623680 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.552789E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.335 | TFLOPs: 32.23 | 7: iteration 10370/ 115203 | consumed samples: 2654720 | consumed tokens: 5436866560 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.544365E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.660 | TFLOPs: 31.67 | 7: iteration 10380/ 115203 | consumed samples: 2657280 | consumed tokens: 5442109440 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.539680E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.560 | TFLOPs: 32.09 | 7: iteration 10390/ 115203 | consumed samples: 2659840 | consumed tokens: 5447352320 | elapsed time per iteration (s): 0.43 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.528512E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.152 | TFLOPs: 31.38 | 7: iteration 10400/ 115203 | consumed samples: 2662400 | consumed tokens: 5452595200 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.543068E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.189 | TFLOPs: 32.07 | 7: iteration 10410/ 115203 | consumed samples: 2664960 | consumed tokens: 5457838080 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.532051E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.660 | TFLOPs: 31.94 | 7: iteration 10420/ 115203 | consumed samples: 2667520 | consumed tokens: 5463080960 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.556231E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.590 | TFLOPs: 32.09 | 7: iteration 10430/ 115203 | consumed samples: 2670080 | consumed tokens: 5468323840 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.551320E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.825 | TFLOPs: 32.00 | 7: iteration 10440/ 115203 | consumed samples: 2672640 | consumed tokens: 5473566720 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.540727E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.207 | TFLOPs: 32.17 | 7: iteration 10450/ 115203 | consumed samples: 2675200 | consumed tokens: 5478809600 | elapsed time per iteration (s): 0.43 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.528665E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.100 | TFLOPs: 31.01 | 7: iteration 10460/ 115203 | consumed samples: 2677760 | consumed tokens: 5484052480 | elapsed time per iteration (s): 0.43 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.537535E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.317 | TFLOPs: 31.50 | 7: iteration 10470/ 115203 | consumed samples: 2680320 | consumed tokens: 5489295360 | elapsed time per iteration (s): 0.42 | learning rate: 1.971E-04 | global batch size: 256 | lm loss: 2.541523E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.874 | TFLOPs: 31.74 | 7: iteration 10480/ 115203 | consumed samples: 2682880 | consumed tokens: 5494538240 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.536130E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.536 | TFLOPs: 32.03 | 7: iteration 10490/ 115203 | consumed samples: 2685440 | consumed tokens: 5499781120 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.530780E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.969 | TFLOPs: 32.00 | 7: iteration 10500/ 115203 | consumed samples: 2688000 | consumed tokens: 5505024000 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.534304E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.136 | TFLOPs: 32.01 | 7: iteration 10510/ 115203 | consumed samples: 2690560 | consumed tokens: 5510266880 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.552687E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.531 | TFLOPs: 31.04 | 7: iteration 10520/ 115203 | consumed samples: 2693120 | consumed tokens: 5515509760 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.584354E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.846 | TFLOPs: 31.95 | 7: iteration 10530/ 115203 | consumed samples: 2695680 | consumed tokens: 5520752640 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.537767E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.396 | TFLOPs: 31.55 | 7: iteration 10540/ 115203 | consumed samples: 2698240 | consumed tokens: 5525995520 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.531474E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.038 | TFLOPs: 31.33 | 7: iteration 10550/ 115203 | consumed samples: 2700800 | consumed tokens: 5531238400 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.520233E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.553 | TFLOPs: 30.99 | 7: iteration 10560/ 115203 | consumed samples: 2703360 | consumed tokens: 5536481280 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.528188E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.564 | TFLOPs: 31.56 | 7: iteration 10570/ 115203 | consumed samples: 2705920 | consumed tokens: 5541724160 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.526030E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.902 | TFLOPs: 31.16 | 7: iteration 10580/ 115203 | consumed samples: 2708480 | consumed tokens: 5546967040 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.531853E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.488 | TFLOPs: 31.14 | 7: iteration 10590/ 115203 | consumed samples: 2711040 | consumed tokens: 5552209920 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.566370E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.108 | TFLOPs: 30.91 | 7: iteration 10600/ 115203 | consumed samples: 2713600 | consumed tokens: 5557452800 | elapsed time per iteration (s): 0.45 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.527602E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.483 | TFLOPs: 30.09 | 7: iteration 10610/ 115203 | consumed samples: 2716160 | consumed tokens: 5562695680 | elapsed time per iteration (s): 0.42 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.507667E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.527 | TFLOPs: 31.82 | 7: iteration 10620/ 115203 | consumed samples: 2718720 | consumed tokens: 5567938560 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.525406E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.848 | TFLOPs: 31.11 | 7: iteration 10630/ 115203 | consumed samples: 2721280 | consumed tokens: 5573181440 | elapsed time per iteration (s): 0.43 | learning rate: 1.970E-04 | global batch size: 256 | lm loss: 2.507767E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.774 | TFLOPs: 31.00 | 7: iteration 10640/ 115203 | consumed samples: 2723840 | consumed tokens: 5578424320 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.538770E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.112 | TFLOPs: 31.96 | 7: iteration 10650/ 115203 | consumed samples: 2726400 | consumed tokens: 5583667200 | elapsed time per iteration (s): 0.44 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.544890E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.823 | TFLOPs: 30.26 | 7: iteration 10660/ 115203 | consumed samples: 2728960 | consumed tokens: 5588910080 | elapsed time per iteration (s): 0.44 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.539450E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.945 | TFLOPs: 30.43 | 7: iteration 10670/ 115203 | consumed samples: 2731520 | consumed tokens: 5594152960 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.557058E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.805 | TFLOPs: 31.63 | 7: iteration 10680/ 115203 | consumed samples: 2734080 | consumed tokens: 5599395840 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.531973E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.280 | TFLOPs: 31.23 | 7: iteration 10690/ 115203 | consumed samples: 2736640 | consumed tokens: 5604638720 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.547198E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.836 | TFLOPs: 31.37 | 7: iteration 10700/ 115203 | consumed samples: 2739200 | consumed tokens: 5609881600 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.530284E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.675 | TFLOPs: 31.67 | 7: iteration 10710/ 115203 | consumed samples: 2741760 | consumed tokens: 5615124480 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.552819E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.090 | TFLOPs: 31.33 | 7: iteration 10720/ 115203 | consumed samples: 2744320 | consumed tokens: 5620367360 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.563158E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.749 | TFLOPs: 31.42 | 7: iteration 10730/ 115203 | consumed samples: 2746880 | consumed tokens: 5625610240 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.529893E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.894 | TFLOPs: 31.21 | 7: iteration 10740/ 115203 | consumed samples: 2749440 | consumed tokens: 5630853120 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.508963E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.424 | TFLOPs: 32.03 | 7: iteration 10750/ 115203 | consumed samples: 2752000 | consumed tokens: 5636096000 | elapsed time per iteration (s): 0.42 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.535914E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.149 | TFLOPs: 31.65 | 7: iteration 10760/ 115203 | consumed samples: 2754560 | consumed tokens: 5641338880 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.539906E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.031 | TFLOPs: 31.54 | 7: iteration 10770/ 115203 | consumed samples: 2757120 | consumed tokens: 5646581760 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.503194E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.497 | TFLOPs: 31.19 | 7: iteration 10780/ 115203 | consumed samples: 2759680 | consumed tokens: 5651824640 | elapsed time per iteration (s): 0.43 | learning rate: 1.969E-04 | global batch size: 256 | lm loss: 2.496816E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.671 | TFLOPs: 31.15 | 7: iteration 10790/ 115203 | consumed samples: 2762240 | consumed tokens: 5657067520 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.544755E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.585 | TFLOPs: 31.98 | 7: iteration 10800/ 115203 | consumed samples: 2764800 | consumed tokens: 5662310400 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.507197E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.808 | TFLOPs: 31.79 | 7: iteration 10810/ 115203 | consumed samples: 2767360 | consumed tokens: 5667553280 | elapsed time per iteration (s): 0.46 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.517053E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.499 | TFLOPs: 29.46 | 7: iteration 10820/ 115203 | consumed samples: 2769920 | consumed tokens: 5672796160 | elapsed time per iteration (s): 0.44 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.524501E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.297 | TFLOPs: 30.71 | 7: iteration 10830/ 115203 | consumed samples: 2772480 | consumed tokens: 5678039040 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.532605E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.812 | TFLOPs: 31.26 | 7: iteration 10840/ 115203 | consumed samples: 2775040 | consumed tokens: 5683281920 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.525090E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.694 | TFLOPs: 31.62 | 7: iteration 10850/ 115203 | consumed samples: 2777600 | consumed tokens: 5688524800 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.491174E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.448 | TFLOPs: 31.56 | 7: iteration 10860/ 115203 | consumed samples: 2780160 | consumed tokens: 5693767680 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.556945E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.233 | TFLOPs: 31.91 | 7: iteration 10870/ 115203 | consumed samples: 2782720 | consumed tokens: 5699010560 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.535082E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.055 | TFLOPs: 31.43 | 7: iteration 10880/ 115203 | consumed samples: 2785280 | consumed tokens: 5704253440 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.547875E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.849 | TFLOPs: 31.21 | 7: iteration 10890/ 115203 | consumed samples: 2787840 | consumed tokens: 5709496320 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.528886E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.132 | TFLOPs: 30.96 | 7: iteration 10900/ 115203 | consumed samples: 2790400 | consumed tokens: 5714739200 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.521158E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.440 | TFLOPs: 31.77 | 7: iteration 10910/ 115203 | consumed samples: 2792960 | consumed tokens: 5719982080 | elapsed time per iteration (s): 0.43 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.556086E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.802 | TFLOPs: 31.21 | 7: iteration 10920/ 115203 | consumed samples: 2795520 | consumed tokens: 5725224960 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.504467E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.151 | TFLOPs: 31.75 | 7: iteration 10930/ 115203 | consumed samples: 2798080 | consumed tokens: 5730467840 | elapsed time per iteration (s): 0.42 | learning rate: 1.968E-04 | global batch size: 256 | lm loss: 2.513899E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.798 | TFLOPs: 31.84 | 7: iteration 10940/ 115203 | consumed samples: 2800640 | consumed tokens: 5735710720 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.485833E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.154 | TFLOPs: 31.80 | 7: iteration 10950/ 115203 | consumed samples: 2803200 | consumed tokens: 5740953600 | elapsed time per iteration (s): 0.44 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.548318E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.787 | TFLOPs: 30.42 | 7: iteration 10960/ 115203 | consumed samples: 2805760 | consumed tokens: 5746196480 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.519528E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.367 | TFLOPs: 31.97 | 7: iteration 10970/ 115203 | consumed samples: 2808320 | consumed tokens: 5751439360 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.525169E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.639 | TFLOPs: 31.93 | 7: iteration 10980/ 115203 | consumed samples: 2810880 | consumed tokens: 5756682240 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.522319E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.875 | TFLOPs: 31.63 | 7: iteration 10990/ 115203 | consumed samples: 2813440 | consumed tokens: 5761925120 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.532754E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.904 | TFLOPs: 31.74 | 7: iteration 11000/ 115203 | consumed samples: 2816000 | consumed tokens: 5767168000 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.531461E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.638 | TFLOPs: 31.99 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 11000 | lm loss value: 2.442046E+00 | lm loss PPL: 1.149654E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 11000 to checkpoints_221m 0: [2022-11-28 14:14:22,069] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step11000 is begin to save! 0: [2022-11-28 14:14:22,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:14:22,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:14:22,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:14:22,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:14:22,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:14:22,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:14:22,258] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:14:22,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:14:22,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:14:22,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:14:22,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:14:22,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:14:22,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:14:22,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:14:22,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:14:22,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:14:22,376] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:14:22,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:14:22,400] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:14:22,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:14:22,424] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:14:22,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:14:22,447] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:14:22,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:14:22,473] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:14:22,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:14:22,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:14:22,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:14:22,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:14:22,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:14:22,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:14:22,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:14:22,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:14:22,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:14:22,592] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:14:22,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:14:22,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:14:22,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:14:22,640] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:14:22,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:14:22,644] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step11000/mp_rank_00_model_states.pt 0: [2022-11-28 14:14:22,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:14:22,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:14:22,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:14:22,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 7: [2022-11-28 14:14:22,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:14:22,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:14:22,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:14:22,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: [2022-11-28 14:14:22,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:14:22,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 2: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 5: [2022-11-28 14:14:22,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 3: [2022-11-28 14:14:22,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 6: [2022-11-28 14:14:22,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 4: [2022-11-28 14:14:22,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:14:22,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:14:22,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 1: [2022-11-28 14:14:22,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step11000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:14:22,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step11000 is ready now! 0: successfully saved checkpoint at iteration 11000 to checkpoints_221m 7: time (ms) | save-checkpoint: 826.77 7: iteration 11010/ 115203 | consumed samples: 2818560 | consumed tokens: 5772410880 | elapsed time per iteration (s): 0.53 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.509135E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 487.471 | TFLOPs: 25.58 | 7: iteration 11020/ 115203 | consumed samples: 2821120 | consumed tokens: 5777653760 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.561645E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.245 | TFLOPs: 31.81 | 7: iteration 11030/ 115203 | consumed samples: 2823680 | consumed tokens: 5782896640 | elapsed time per iteration (s): 0.44 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.496984E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.599 | TFLOPs: 30.52 | 7: iteration 11040/ 115203 | consumed samples: 2826240 | consumed tokens: 5788139520 | elapsed time per iteration (s): 0.43 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.526261E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.079 | TFLOPs: 31.17 | 7: iteration 11050/ 115203 | consumed samples: 2828800 | consumed tokens: 5793382400 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.542800E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.270 | TFLOPs: 31.91 | 7: iteration 11060/ 115203 | consumed samples: 2831360 | consumed tokens: 5798625280 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.503328E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.153 | TFLOPs: 31.96 | 7: iteration 11070/ 115203 | consumed samples: 2833920 | consumed tokens: 5803868160 | elapsed time per iteration (s): 0.42 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.514951E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.951 | TFLOPs: 31.90 | 7: iteration 11080/ 115203 | consumed samples: 2836480 | consumed tokens: 5809111040 | elapsed time per iteration (s): 0.44 | learning rate: 1.967E-04 | global batch size: 256 | lm loss: 2.534932E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.431 | TFLOPs: 30.72 | 7: iteration 11090/ 115203 | consumed samples: 2839040 | consumed tokens: 5814353920 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.516644E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.006 | TFLOPs: 31.85 | 7: iteration 11100/ 115203 | consumed samples: 2841600 | consumed tokens: 5819596800 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.542251E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.707 | TFLOPs: 31.73 | 7: iteration 11110/ 115203 | consumed samples: 2844160 | consumed tokens: 5824839680 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.551008E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.391 | TFLOPs: 32.03 | 7: iteration 11120/ 115203 | consumed samples: 2846720 | consumed tokens: 5830082560 | elapsed time per iteration (s): 0.44 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.535774E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.256 | TFLOPs: 30.50 | 7: iteration 11130/ 115203 | consumed samples: 2849280 | consumed tokens: 5835325440 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.499303E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.754 | TFLOPs: 32.10 | 7: iteration 11140/ 115203 | consumed samples: 2851840 | consumed tokens: 5840568320 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.526890E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.035 | TFLOPs: 32.11 | 7: iteration 11150/ 115203 | consumed samples: 2854400 | consumed tokens: 5845811200 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.513616E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.259 | TFLOPs: 31.65 | 7: iteration 11160/ 115203 | consumed samples: 2856960 | consumed tokens: 5851054080 | elapsed time per iteration (s): 0.43 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.532945E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.516 | TFLOPs: 31.56 | 7: iteration 11170/ 115203 | consumed samples: 2859520 | consumed tokens: 5856296960 | elapsed time per iteration (s): 0.44 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.522021E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.425 | TFLOPs: 30.66 | 7: iteration 11180/ 115203 | consumed samples: 2862080 | consumed tokens: 5861539840 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.527548E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.558 | TFLOPs: 31.77 | 7: iteration 11190/ 115203 | consumed samples: 2864640 | consumed tokens: 5866782720 | elapsed time per iteration (s): 0.43 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.474181E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.155 | TFLOPs: 31.02 | 7: iteration 11200/ 115203 | consumed samples: 2867200 | consumed tokens: 5872025600 | elapsed time per iteration (s): 0.43 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.567578E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.260 | TFLOPs: 31.55 | 7: iteration 11210/ 115203 | consumed samples: 2869760 | consumed tokens: 5877268480 | elapsed time per iteration (s): 0.43 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.528416E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.219 | TFLOPs: 31.28 | 7: iteration 11220/ 115203 | consumed samples: 2872320 | consumed tokens: 5882511360 | elapsed time per iteration (s): 0.42 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.552342E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.653 | TFLOPs: 31.94 | 7: iteration 11230/ 115203 | consumed samples: 2874880 | consumed tokens: 5887754240 | elapsed time per iteration (s): 0.43 | learning rate: 1.966E-04 | global batch size: 256 | lm loss: 2.535854E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.177 | TFLOPs: 31.39 | 7: iteration 11240/ 115203 | consumed samples: 2877440 | consumed tokens: 5892997120 | elapsed time per iteration (s): 0.43 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.544519E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.344 | TFLOPs: 31.13 | 7: iteration 11250/ 115203 | consumed samples: 2880000 | consumed tokens: 5898240000 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.535256E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.084 | TFLOPs: 31.91 | 7: iteration 11260/ 115203 | consumed samples: 2882560 | consumed tokens: 5903482880 | elapsed time per iteration (s): 0.43 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.520229E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.008 | TFLOPs: 31.53 | 7: iteration 11270/ 115203 | consumed samples: 2885120 | consumed tokens: 5908725760 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.524482E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.248 | TFLOPs: 31.91 | 7: iteration 11280/ 115203 | consumed samples: 2887680 | consumed tokens: 5913968640 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.557061E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.733 | TFLOPs: 31.99 | 7: iteration 11290/ 115203 | consumed samples: 2890240 | consumed tokens: 5919211520 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.514130E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.621 | TFLOPs: 32.25 | 7: iteration 11300/ 115203 | consumed samples: 2892800 | consumed tokens: 5924454400 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.501381E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.432 | TFLOPs: 31.92 | 7: iteration 11310/ 115203 | consumed samples: 2895360 | consumed tokens: 5929697280 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.512905E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.609 | TFLOPs: 31.78 | 7: iteration 11320/ 115203 | consumed samples: 2897920 | consumed tokens: 5934940160 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.507721E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.002 | TFLOPs: 31.95 | 7: iteration 11330/ 115203 | consumed samples: 2900480 | consumed tokens: 5940183040 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.505285E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.630 | TFLOPs: 31.93 | 7: iteration 11340/ 115203 | consumed samples: 2903040 | consumed tokens: 5945425920 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.526884E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.956 | TFLOPs: 31.74 | 7: iteration 11350/ 115203 | consumed samples: 2905600 | consumed tokens: 5950668800 | elapsed time per iteration (s): 0.43 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.531692E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.992 | TFLOPs: 31.48 | 7: iteration 11360/ 115203 | consumed samples: 2908160 | consumed tokens: 5955911680 | elapsed time per iteration (s): 0.43 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.544757E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.564 | TFLOPs: 31.51 | 7: iteration 11370/ 115203 | consumed samples: 2910720 | consumed tokens: 5961154560 | elapsed time per iteration (s): 0.44 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.523201E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.128 | TFLOPs: 30.60 | 7: iteration 11380/ 115203 | consumed samples: 2913280 | consumed tokens: 5966397440 | elapsed time per iteration (s): 0.42 | learning rate: 1.965E-04 | global batch size: 256 | lm loss: 2.518294E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.491 | TFLOPs: 31.72 | 7: iteration 11390/ 115203 | consumed samples: 2915840 | consumed tokens: 5971640320 | elapsed time per iteration (s): 0.44 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.511003E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.212 | TFLOPs: 30.60 | 7: iteration 11400/ 115203 | consumed samples: 2918400 | consumed tokens: 5976883200 | elapsed time per iteration (s): 0.43 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.551123E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.893 | TFLOPs: 31.58 | 7: iteration 11410/ 115203 | consumed samples: 2920960 | consumed tokens: 5982126080 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.490532E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.517 | TFLOPs: 32.19 | 7: iteration 11420/ 115203 | consumed samples: 2923520 | consumed tokens: 5987368960 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.489320E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.136 | TFLOPs: 32.01 | 7: iteration 11430/ 115203 | consumed samples: 2926080 | consumed tokens: 5992611840 | elapsed time per iteration (s): 0.43 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.509563E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.026 | TFLOPs: 31.33 | 7: iteration 11440/ 115203 | consumed samples: 2928640 | consumed tokens: 5997854720 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.539008E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.478 | TFLOPs: 31.98 | 7: iteration 11450/ 115203 | consumed samples: 2931200 | consumed tokens: 6003097600 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.498342E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.549 | TFLOPs: 31.67 | 7: iteration 11460/ 115203 | consumed samples: 2933760 | consumed tokens: 6008340480 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.516168E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.412 | TFLOPs: 32.03 | 7: iteration 11470/ 115203 | consumed samples: 2936320 | consumed tokens: 6013583360 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.536506E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.731 | TFLOPs: 32.10 | 7: iteration 11480/ 115203 | consumed samples: 2938880 | consumed tokens: 6018826240 | elapsed time per iteration (s): 0.43 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.544921E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.292 | TFLOPs: 31.50 | 7: iteration 11490/ 115203 | consumed samples: 2941440 | consumed tokens: 6024069120 | elapsed time per iteration (s): 0.45 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.529488E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.857 | TFLOPs: 30.00 | 7: iteration 11500/ 115203 | consumed samples: 2944000 | consumed tokens: 6029312000 | elapsed time per iteration (s): 0.43 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.544060E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.354 | TFLOPs: 31.50 | 7: iteration 11510/ 115203 | consumed samples: 2946560 | consumed tokens: 6034554880 | elapsed time per iteration (s): 0.42 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.541323E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.435 | TFLOPs: 31.71 | 7: iteration 11520/ 115203 | consumed samples: 2949120 | consumed tokens: 6039797760 | elapsed time per iteration (s): 0.45 | learning rate: 1.964E-04 | global batch size: 256 | lm loss: 2.518289E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.395 | TFLOPs: 29.82 | 7: iteration 11530/ 115203 | consumed samples: 2951680 | consumed tokens: 6045040640 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.488433E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.444 | TFLOPs: 32.03 | 7: iteration 11540/ 115203 | consumed samples: 2954240 | consumed tokens: 6050283520 | elapsed time per iteration (s): 0.43 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.536554E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.387 | TFLOPs: 31.55 | 7: iteration 11550/ 115203 | consumed samples: 2956800 | consumed tokens: 6055526400 | elapsed time per iteration (s): 0.43 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.532797E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.609 | TFLOPs: 31.46 | 7: iteration 11560/ 115203 | consumed samples: 2959360 | consumed tokens: 6060769280 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.531793E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.496 | TFLOPs: 31.66 | 7: iteration 11570/ 115203 | consumed samples: 2961920 | consumed tokens: 6066012160 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.510835E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.735 | TFLOPs: 31.99 | 7: iteration 11580/ 115203 | consumed samples: 2964480 | consumed tokens: 6071255040 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.491717E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.964 | TFLOPs: 31.69 | 7: iteration 11590/ 115203 | consumed samples: 2967040 | consumed tokens: 6076497920 | elapsed time per iteration (s): 0.43 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.525495E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.182 | TFLOPs: 31.44 | 7: iteration 11600/ 115203 | consumed samples: 2969600 | consumed tokens: 6081740800 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.527308E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.463 | TFLOPs: 31.93 | 7: iteration 11610/ 115203 | consumed samples: 2972160 | consumed tokens: 6086983680 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.527412E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.362 | TFLOPs: 31.97 | 7: iteration 11620/ 115203 | consumed samples: 2974720 | consumed tokens: 6092226560 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.556321E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.926 | TFLOPs: 31.63 | 7: iteration 11630/ 115203 | consumed samples: 2977280 | consumed tokens: 6097469440 | elapsed time per iteration (s): 0.43 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.506977E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.504 | TFLOPs: 31.56 | 7: iteration 11640/ 115203 | consumed samples: 2979840 | consumed tokens: 6102712320 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.539710E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.185 | TFLOPs: 31.75 | 7: iteration 11650/ 115203 | consumed samples: 2982400 | consumed tokens: 6107955200 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.490635E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.899 | TFLOPs: 31.69 | 7: iteration 11660/ 115203 | consumed samples: 2984960 | consumed tokens: 6113198080 | elapsed time per iteration (s): 0.42 | learning rate: 1.963E-04 | global batch size: 256 | lm loss: 2.523567E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.490 | TFLOPs: 31.72 | 7: iteration 11670/ 115203 | consumed samples: 2987520 | consumed tokens: 6118440960 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.498564E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.681 | TFLOPs: 31.73 | 7: iteration 11680/ 115203 | consumed samples: 2990080 | consumed tokens: 6123683840 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.541699E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.344 | TFLOPs: 31.39 | 7: iteration 11690/ 115203 | consumed samples: 2992640 | consumed tokens: 6128926720 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.535341E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.917 | TFLOPs: 31.53 | 7: iteration 11700/ 115203 | consumed samples: 2995200 | consumed tokens: 6134169600 | elapsed time per iteration (s): 0.44 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.497687E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.993 | TFLOPs: 30.43 | 7: iteration 11710/ 115203 | consumed samples: 2997760 | consumed tokens: 6139412480 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.521322E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.673 | TFLOPs: 31.46 | 7: iteration 11720/ 115203 | consumed samples: 3000320 | consumed tokens: 6144655360 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.508525E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.120 | TFLOPs: 31.28 | 7: iteration 11730/ 115203 | consumed samples: 3002880 | consumed tokens: 6149898240 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.509444E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.169 | TFLOPs: 31.80 | 7: iteration 11740/ 115203 | consumed samples: 3005440 | consumed tokens: 6155141120 | elapsed time per iteration (s): 0.44 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.517382E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.104 | TFLOPs: 30.70 | 7: iteration 11750/ 115203 | consumed samples: 3008000 | consumed tokens: 6160384000 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.488455E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.172 | TFLOPs: 31.96 | 7: iteration 11760/ 115203 | consumed samples: 3010560 | consumed tokens: 6165626880 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.523913E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.127 | TFLOPs: 31.85 | 7: iteration 11770/ 115203 | consumed samples: 3013120 | consumed tokens: 6170869760 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.533562E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.630 | TFLOPs: 31.51 | 7: iteration 11780/ 115203 | consumed samples: 3015680 | consumed tokens: 6176112640 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.497777E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.260 | TFLOPs: 31.86 | 7: iteration 11790/ 115203 | consumed samples: 3018240 | consumed tokens: 6181355520 | elapsed time per iteration (s): 0.43 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.510469E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.142 | TFLOPs: 31.33 | 7: iteration 11800/ 115203 | consumed samples: 3020800 | consumed tokens: 6186598400 | elapsed time per iteration (s): 0.42 | learning rate: 1.962E-04 | global batch size: 256 | lm loss: 2.512285E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.957 | TFLOPs: 32.21 | 7: iteration 11810/ 115203 | consumed samples: 3023360 | consumed tokens: 6191841280 | elapsed time per iteration (s): 0.43 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.499212E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.185 | TFLOPs: 31.54 | 7: iteration 11820/ 115203 | consumed samples: 3025920 | consumed tokens: 6197084160 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.514984E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.739 | TFLOPs: 32.04 | 7: iteration 11830/ 115203 | consumed samples: 3028480 | consumed tokens: 6202327040 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.455089E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.922 | TFLOPs: 32.21 | 7: iteration 11840/ 115203 | consumed samples: 3031040 | consumed tokens: 6207569920 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.498891E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.996 | TFLOPs: 31.69 | 7: iteration 11850/ 115203 | consumed samples: 3033600 | consumed tokens: 6212812800 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.531839E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.365 | TFLOPs: 31.92 | 7: iteration 11860/ 115203 | consumed samples: 3036160 | consumed tokens: 6218055680 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.462650E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.472 | TFLOPs: 32.03 | 7: iteration 11870/ 115203 | consumed samples: 3038720 | consumed tokens: 6223298560 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.513653E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.129 | TFLOPs: 31.65 | 7: iteration 11880/ 115203 | consumed samples: 3041280 | consumed tokens: 6228541440 | elapsed time per iteration (s): 0.43 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.509690E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.659 | TFLOPs: 31.15 | 7: iteration 11890/ 115203 | consumed samples: 3043840 | consumed tokens: 6233784320 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.525327E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.272 | TFLOPs: 31.81 | 7: iteration 11900/ 115203 | consumed samples: 3046400 | consumed tokens: 6239027200 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.484495E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.771 | TFLOPs: 31.99 | 7: iteration 11910/ 115203 | consumed samples: 3048960 | consumed tokens: 6244270080 | elapsed time per iteration (s): 0.42 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.540677E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.412 | TFLOPs: 31.82 | 7: iteration 11920/ 115203 | consumed samples: 3051520 | consumed tokens: 6249512960 | elapsed time per iteration (s): 0.43 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.537609E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.124 | TFLOPs: 31.38 | 7: iteration 11930/ 115203 | consumed samples: 3054080 | consumed tokens: 6254755840 | elapsed time per iteration (s): 0.44 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.533662E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.690 | TFLOPs: 30.78 | 7: iteration 11940/ 115203 | consumed samples: 3056640 | consumed tokens: 6259998720 | elapsed time per iteration (s): 0.44 | learning rate: 1.961E-04 | global batch size: 256 | lm loss: 2.531905E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.600 | TFLOPs: 30.36 | 7: iteration 11950/ 115203 | consumed samples: 3059200 | consumed tokens: 6265241600 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.470502E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.716 | TFLOPs: 31.78 | 7: iteration 11960/ 115203 | consumed samples: 3061760 | consumed tokens: 6270484480 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.511590E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.863 | TFLOPs: 31.47 | 7: iteration 11970/ 115203 | consumed samples: 3064320 | consumed tokens: 6275727360 | elapsed time per iteration (s): 0.44 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.508453E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.833 | TFLOPs: 30.79 | 7: iteration 11980/ 115203 | consumed samples: 3066880 | consumed tokens: 6280970240 | elapsed time per iteration (s): 0.44 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.539863E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.453 | TFLOPs: 30.30 | 7: iteration 11990/ 115203 | consumed samples: 3069440 | consumed tokens: 6286213120 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.510652E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.670 | TFLOPs: 31.73 | 0: [2022-11-28 14:21:28,620] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=0, lr=[0.0001960118617437879, 0.0001960118617437879, 0.0001960118617437879], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 12000/ 115203 | consumed samples: 3072000 | consumed tokens: 6291456000 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.537352E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.657 | TFLOPs: 32.15 | 0: steps: 12000 loss: 2.5934 iter time (s): 0.426 samples/sec: 601.116 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 12000 | lm loss value: 2.476266E+00 | lm loss PPL: 1.189676E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 12000 to checkpoints_221m 0: [2022-11-28 14:21:28,779] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step12000 is begin to save! 0: [2022-11-28 14:21:28,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:21:28,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:21:28,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:21:28,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:21:28,906] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:21:28,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:21:28,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:21:28,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:21:28,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:21:28,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:21:28,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:21:28,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:21:28,999] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:21:29,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:21:29,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:21:29,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:21:29,045] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:21:29,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:21:29,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:21:29,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:21:29,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:21:29,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:21:29,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:21:29,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:21:29,139] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:21:29,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:21:29,164] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:21:29,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:21:29,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:21:29,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:21:29,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:21:29,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:21:29,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:21:29,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:21:29,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:21:29,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:21:29,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:21:29,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:21:29,305] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:21:29,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:21:29,311] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step12000/mp_rank_00_model_states.pt 0: [2022-11-28 14:21:29,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:21:29,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:21:29,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 7: [2022-11-28 14:21:29,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:21:29,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:21:29,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 1: [2022-11-28 14:21:29,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:21:29,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:21:29,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 4: [2022-11-28 14:21:29,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:21:29,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:21:29,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:21:29,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:21:29,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 2: [2022-11-28 14:21:29,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 6: [2022-11-28 14:21:29,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:21:29,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:21:29,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:21:29,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 5: [2022-11-28 14:21:29,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 14:21:29,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 3: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:21:29,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 14:21:29,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:21:29,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: [2022-11-28 14:21:29,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step12000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:21:29,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step12000 is ready now! 0: successfully saved checkpoint at iteration 12000 to checkpoints_221m 7: time (ms) | save-checkpoint: 655.11 7: iteration 12010/ 115203 | consumed samples: 3074560 | consumed tokens: 6296698880 | elapsed time per iteration (s): 0.51 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.509735E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 506.650 | TFLOPs: 26.58 | 7: iteration 12020/ 115203 | consumed samples: 3077120 | consumed tokens: 6301941760 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.470940E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.495 | TFLOPs: 31.66 | 7: iteration 12030/ 115203 | consumed samples: 3079680 | consumed tokens: 6307184640 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.485248E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.683 | TFLOPs: 31.15 | 7: iteration 12040/ 115203 | consumed samples: 3082240 | consumed tokens: 6312427520 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.503877E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.597 | TFLOPs: 31.15 | 7: iteration 12050/ 115203 | consumed samples: 3084800 | consumed tokens: 6317670400 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.485293E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.536 | TFLOPs: 31.77 | 7: iteration 12060/ 115203 | consumed samples: 3087360 | consumed tokens: 6322913280 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.492072E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.210 | TFLOPs: 32.07 | 7: iteration 12070/ 115203 | consumed samples: 3089920 | consumed tokens: 6328156160 | elapsed time per iteration (s): 0.43 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.518037E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.781 | TFLOPs: 31.42 | 7: iteration 12080/ 115203 | consumed samples: 3092480 | consumed tokens: 6333399040 | elapsed time per iteration (s): 0.42 | learning rate: 1.960E-04 | global batch size: 256 | lm loss: 2.498790E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.497 | TFLOPs: 31.82 | 7: iteration 12090/ 115203 | consumed samples: 3095040 | consumed tokens: 6338641920 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.493958E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.688 | TFLOPs: 31.41 | 7: iteration 12100/ 115203 | consumed samples: 3097600 | consumed tokens: 6343884800 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.490677E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.720 | TFLOPs: 31.47 | 7: iteration 12110/ 115203 | consumed samples: 3100160 | consumed tokens: 6349127680 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.464795E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.973 | TFLOPs: 31.48 | 7: iteration 12120/ 115203 | consumed samples: 3102720 | consumed tokens: 6354370560 | elapsed time per iteration (s): 0.42 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.520789E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.239 | TFLOPs: 31.70 | 7: iteration 12130/ 115203 | consumed samples: 3105280 | consumed tokens: 6359613440 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.524710E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.832 | TFLOPs: 31.31 | 7: iteration 12140/ 115203 | consumed samples: 3107840 | consumed tokens: 6364856320 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.521059E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.317 | TFLOPs: 31.55 | 7: iteration 12150/ 115203 | consumed samples: 3110400 | consumed tokens: 6370099200 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.530190E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.596 | TFLOPs: 31.15 | 7: iteration 12160/ 115203 | consumed samples: 3112960 | consumed tokens: 6375342080 | elapsed time per iteration (s): 0.44 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.491312E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.048 | TFLOPs: 30.28 | 7: iteration 12170/ 115203 | consumed samples: 3115520 | consumed tokens: 6380584960 | elapsed time per iteration (s): 0.42 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.450976E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.470 | TFLOPs: 31.98 | 7: iteration 12180/ 115203 | consumed samples: 3118080 | consumed tokens: 6385827840 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.491786E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.448 | TFLOPs: 31.50 | 7: iteration 12190/ 115203 | consumed samples: 3120640 | consumed tokens: 6391070720 | elapsed time per iteration (s): 0.43 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.490320E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.784 | TFLOPs: 31.57 | 7: iteration 12200/ 115203 | consumed samples: 3123200 | consumed tokens: 6396313600 | elapsed time per iteration (s): 0.42 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.524762E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.386 | TFLOPs: 31.66 | 7: iteration 12210/ 115203 | consumed samples: 3125760 | consumed tokens: 6401556480 | elapsed time per iteration (s): 0.42 | learning rate: 1.959E-04 | global batch size: 256 | lm loss: 2.521249E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.144 | TFLOPs: 32.07 | 7: iteration 12220/ 115203 | consumed samples: 3128320 | consumed tokens: 6406799360 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.509863E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.123 | TFLOPs: 31.96 | 7: iteration 12230/ 115203 | consumed samples: 3130880 | consumed tokens: 6412042240 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.508884E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.623 | TFLOPs: 31.57 | 7: iteration 12240/ 115203 | consumed samples: 3133440 | consumed tokens: 6417285120 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.475596E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.084 | TFLOPs: 32.01 | 7: iteration 12250/ 115203 | consumed samples: 3136000 | consumed tokens: 6422528000 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.507481E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.812 | TFLOPs: 31.73 | 7: iteration 12260/ 115203 | consumed samples: 3138560 | consumed tokens: 6427770880 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.510471E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.055 | TFLOPs: 31.69 | 7: iteration 12270/ 115203 | consumed samples: 3141120 | consumed tokens: 6433013760 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.508984E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.710 | TFLOPs: 31.73 | 7: iteration 12280/ 115203 | consumed samples: 3143680 | consumed tokens: 6438256640 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.505983E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.940 | TFLOPs: 31.95 | 7: iteration 12290/ 115203 | consumed samples: 3146240 | consumed tokens: 6443499520 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.517975E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.829 | TFLOPs: 31.94 | 7: iteration 12300/ 115203 | consumed samples: 3148800 | consumed tokens: 6448742400 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.503163E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.249 | TFLOPs: 31.76 | 7: iteration 12310/ 115203 | consumed samples: 3151360 | consumed tokens: 6453985280 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.524389E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.184 | TFLOPs: 31.39 | 7: iteration 12320/ 115203 | consumed samples: 3153920 | consumed tokens: 6459228160 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.490623E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.839 | TFLOPs: 31.42 | 7: iteration 12330/ 115203 | consumed samples: 3156480 | consumed tokens: 6464471040 | elapsed time per iteration (s): 0.43 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.531362E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.912 | TFLOPs: 31.37 | 7: iteration 12340/ 115203 | consumed samples: 3159040 | consumed tokens: 6469713920 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.518591E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.630 | TFLOPs: 32.04 | 7: iteration 12350/ 115203 | consumed samples: 3161600 | consumed tokens: 6474956800 | elapsed time per iteration (s): 0.42 | learning rate: 1.958E-04 | global batch size: 256 | lm loss: 2.507996E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.043 | TFLOPs: 31.90 | 7: iteration 12360/ 115203 | consumed samples: 3164160 | consumed tokens: 6480199680 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.510067E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.424 | TFLOPs: 30.72 | 7: iteration 12370/ 115203 | consumed samples: 3166720 | consumed tokens: 6485442560 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.499609E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.008 | TFLOPs: 31.95 | 7: iteration 12380/ 115203 | consumed samples: 3169280 | consumed tokens: 6490685440 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.512779E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.141 | TFLOPs: 31.86 | 7: iteration 12390/ 115203 | consumed samples: 3171840 | consumed tokens: 6495928320 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.477368E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.040 | TFLOPs: 31.48 | 7: iteration 12400/ 115203 | consumed samples: 3174400 | consumed tokens: 6501171200 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.530376E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.828 | TFLOPs: 30.58 | 7: iteration 12410/ 115203 | consumed samples: 3176960 | consumed tokens: 6506414080 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.534748E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.004 | TFLOPs: 31.74 | 7: iteration 12420/ 115203 | consumed samples: 3179520 | consumed tokens: 6511656960 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.501184E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.583 | TFLOPs: 31.83 | 7: iteration 12430/ 115203 | consumed samples: 3182080 | consumed tokens: 6516899840 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.510718E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.921 | TFLOPs: 31.53 | 7: iteration 12440/ 115203 | consumed samples: 3184640 | consumed tokens: 6522142720 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.484361E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.492 | TFLOPs: 31.72 | 7: iteration 12450/ 115203 | consumed samples: 3187200 | consumed tokens: 6527385600 | elapsed time per iteration (s): 0.43 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.506879E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.621 | TFLOPs: 31.15 | 7: iteration 12460/ 115203 | consumed samples: 3189760 | consumed tokens: 6532628480 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.534886E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.148 | TFLOPs: 30.44 | 7: iteration 12470/ 115203 | consumed samples: 3192320 | consumed tokens: 6537871360 | elapsed time per iteration (s): 0.44 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.505269E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.194 | TFLOPs: 30.86 | 7: iteration 12480/ 115203 | consumed samples: 3194880 | consumed tokens: 6543114240 | elapsed time per iteration (s): 0.42 | learning rate: 1.957E-04 | global batch size: 256 | lm loss: 2.479144E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.819 | TFLOPs: 32.10 | 7: iteration 12490/ 115203 | consumed samples: 3197440 | consumed tokens: 6548357120 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.477672E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.675 | TFLOPs: 31.46 | 7: iteration 12500/ 115203 | consumed samples: 3200000 | consumed tokens: 6553600000 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.481791E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.343 | TFLOPs: 31.66 | 7: iteration 12510/ 115203 | consumed samples: 3202560 | consumed tokens: 6558842880 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.483508E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.442 | TFLOPs: 31.71 | 7: iteration 12520/ 115203 | consumed samples: 3205120 | consumed tokens: 6564085760 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.511949E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.305 | TFLOPs: 31.71 | 7: iteration 12530/ 115203 | consumed samples: 3207680 | consumed tokens: 6569328640 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.488420E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.395 | TFLOPs: 32.13 | 7: iteration 12540/ 115203 | consumed samples: 3210240 | consumed tokens: 6574571520 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.470204E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.714 | TFLOPs: 31.68 | 7: iteration 12550/ 115203 | consumed samples: 3212800 | consumed tokens: 6579814400 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.503265E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.957 | TFLOPs: 31.79 | 7: iteration 12560/ 115203 | consumed samples: 3215360 | consumed tokens: 6585057280 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.538321E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.580 | TFLOPs: 31.93 | 7: iteration 12570/ 115203 | consumed samples: 3217920 | consumed tokens: 6590300160 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.501729E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.932 | TFLOPs: 31.58 | 7: iteration 12580/ 115203 | consumed samples: 3220480 | consumed tokens: 6595543040 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.502901E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.130 | TFLOPs: 30.96 | 7: iteration 12590/ 115203 | consumed samples: 3223040 | consumed tokens: 6600785920 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.523590E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.782 | TFLOPs: 31.78 | 7: iteration 12600/ 115203 | consumed samples: 3225600 | consumed tokens: 6606028800 | elapsed time per iteration (s): 0.42 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.497174E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.160 | TFLOPs: 32.12 | 7: iteration 12610/ 115203 | consumed samples: 3228160 | consumed tokens: 6611271680 | elapsed time per iteration (s): 0.43 | learning rate: 1.956E-04 | global batch size: 256 | lm loss: 2.491853E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.766 | TFLOPs: 31.05 | 7: iteration 12620/ 115203 | consumed samples: 3230720 | consumed tokens: 6616514560 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.471472E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.742 | TFLOPs: 31.78 | 7: iteration 12630/ 115203 | consumed samples: 3233280 | consumed tokens: 6621757440 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.530544E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.230 | TFLOPs: 31.70 | 7: iteration 12640/ 115203 | consumed samples: 3235840 | consumed tokens: 6627000320 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.504359E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.742 | TFLOPs: 31.62 | 7: iteration 12650/ 115203 | consumed samples: 3238400 | consumed tokens: 6632243200 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.482552E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.528 | TFLOPs: 31.61 | 7: iteration 12660/ 115203 | consumed samples: 3240960 | consumed tokens: 6637486080 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.493134E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.977 | TFLOPs: 32.00 | 7: iteration 12670/ 115203 | consumed samples: 3243520 | consumed tokens: 6642728960 | elapsed time per iteration (s): 0.43 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.490678E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.490 | TFLOPs: 31.56 | 7: iteration 12680/ 115203 | consumed samples: 3246080 | consumed tokens: 6647971840 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.524468E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.610 | TFLOPs: 31.67 | 7: iteration 12690/ 115203 | consumed samples: 3248640 | consumed tokens: 6653214720 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.484309E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.858 | TFLOPs: 31.63 | 7: iteration 12700/ 115203 | consumed samples: 3251200 | consumed tokens: 6658457600 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.499090E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.060 | TFLOPs: 32.06 | 7: iteration 12710/ 115203 | consumed samples: 3253760 | consumed tokens: 6663700480 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.485258E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.608 | TFLOPs: 31.62 | 7: iteration 12720/ 115203 | consumed samples: 3256320 | consumed tokens: 6668943360 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.480393E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.363 | TFLOPs: 31.97 | 7: iteration 12730/ 115203 | consumed samples: 3258880 | consumed tokens: 6674186240 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.473331E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.798 | TFLOPs: 31.68 | 7: iteration 12740/ 115203 | consumed samples: 3261440 | consumed tokens: 6679429120 | elapsed time per iteration (s): 0.42 | learning rate: 1.955E-04 | global batch size: 256 | lm loss: 2.485814E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.478 | TFLOPs: 32.08 | 7: iteration 12750/ 115203 | consumed samples: 3264000 | consumed tokens: 6684672000 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.523808E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.248 | TFLOPs: 31.70 | 7: iteration 12760/ 115203 | consumed samples: 3266560 | consumed tokens: 6689914880 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.495527E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.624 | TFLOPs: 31.51 | 7: iteration 12770/ 115203 | consumed samples: 3269120 | consumed tokens: 6695157760 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.483310E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.635 | TFLOPs: 31.78 | 7: iteration 12780/ 115203 | consumed samples: 3271680 | consumed tokens: 6700400640 | elapsed time per iteration (s): 0.45 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.499441E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.102 | TFLOPs: 30.17 | 7: iteration 12790/ 115203 | consumed samples: 3274240 | consumed tokens: 6705643520 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.484526E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.892 | TFLOPs: 31.69 | 7: iteration 12800/ 115203 | consumed samples: 3276800 | consumed tokens: 6710886400 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.506125E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.057 | TFLOPs: 31.38 | 7: iteration 12810/ 115203 | consumed samples: 3279360 | consumed tokens: 6716129280 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.496418E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.259 | TFLOPs: 31.97 | 7: iteration 12820/ 115203 | consumed samples: 3281920 | consumed tokens: 6721372160 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.489818E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.119 | TFLOPs: 31.12 | 7: iteration 12830/ 115203 | consumed samples: 3284480 | consumed tokens: 6726615040 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.495272E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.900 | TFLOPs: 31.48 | 7: iteration 12840/ 115203 | consumed samples: 3287040 | consumed tokens: 6731857920 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.490678E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.043 | TFLOPs: 31.54 | 7: iteration 12850/ 115203 | consumed samples: 3289600 | consumed tokens: 6737100800 | elapsed time per iteration (s): 0.43 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.451204E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.242 | TFLOPs: 31.55 | 7: iteration 12860/ 115203 | consumed samples: 3292160 | consumed tokens: 6742343680 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.483792E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.199 | TFLOPs: 32.12 | 7: iteration 12870/ 115203 | consumed samples: 3294720 | consumed tokens: 6747586560 | elapsed time per iteration (s): 0.42 | learning rate: 1.954E-04 | global batch size: 256 | lm loss: 2.480860E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.835 | TFLOPs: 32.26 | 7: iteration 12880/ 115203 | consumed samples: 3297280 | consumed tokens: 6752829440 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.503846E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.253 | TFLOPs: 31.60 | 7: iteration 12890/ 115203 | consumed samples: 3299840 | consumed tokens: 6758072320 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.502591E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.160 | TFLOPs: 32.22 | 7: iteration 12900/ 115203 | consumed samples: 3302400 | consumed tokens: 6763315200 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.497695E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.005 | TFLOPs: 31.64 | 7: iteration 12910/ 115203 | consumed samples: 3304960 | consumed tokens: 6768558080 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.494574E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.070 | TFLOPs: 31.54 | 7: iteration 12920/ 115203 | consumed samples: 3307520 | consumed tokens: 6773800960 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.534981E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.505 | TFLOPs: 31.87 | 7: iteration 12930/ 115203 | consumed samples: 3310080 | consumed tokens: 6779043840 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.505372E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.947 | TFLOPs: 31.85 | 7: iteration 12940/ 115203 | consumed samples: 3312640 | consumed tokens: 6784286720 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.508040E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.155 | TFLOPs: 31.28 | 7: iteration 12950/ 115203 | consumed samples: 3315200 | consumed tokens: 6789529600 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.485510E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.983 | TFLOPs: 31.48 | 7: iteration 12960/ 115203 | consumed samples: 3317760 | consumed tokens: 6794772480 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.463234E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.021 | TFLOPs: 31.95 | 7: iteration 12970/ 115203 | consumed samples: 3320320 | consumed tokens: 6800015360 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.477712E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.638 | TFLOPs: 31.30 | 7: iteration 12980/ 115203 | consumed samples: 3322880 | consumed tokens: 6805258240 | elapsed time per iteration (s): 0.42 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.509407E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.944 | TFLOPs: 32.11 | 7: iteration 12990/ 115203 | consumed samples: 3325440 | consumed tokens: 6810501120 | elapsed time per iteration (s): 0.43 | learning rate: 1.953E-04 | global batch size: 256 | lm loss: 2.459558E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.172 | TFLOPs: 30.97 | 7: iteration 13000/ 115203 | consumed samples: 3328000 | consumed tokens: 6815744000 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.502664E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.604 | TFLOPs: 31.41 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 13000 | lm loss value: 2.422436E+00 | lm loss PPL: 1.127328E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 13000 to checkpoints_221m 0: [2022-11-28 14:28:34,676] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step13000 is begin to save! 0: [2022-11-28 14:28:34,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:28:34,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:28:34,781] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:28:34,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:28:34,802] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:28:34,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:28:34,826] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:28:34,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:28:34,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:28:34,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:28:34,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:28:34,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:28:34,895] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:28:34,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:28:34,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:28:34,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:28:34,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:28:34,965] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:28:34,966] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:28:34,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:28:34,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:28:35,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:28:35,012] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:28:35,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:28:35,036] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:28:35,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:28:35,059] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:28:35,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:28:35,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:28:35,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:28:35,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:28:35,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:28:35,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:28:35,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:28:35,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:28:35,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:28:35,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:28:35,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:28:35,200] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:28:35,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:28:35,205] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step13000/mp_rank_00_model_states.pt 0: [2022-11-28 14:28:35,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:28:35,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:28:35,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:28:35,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,271] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,271] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,272] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,272] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,277] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,277] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,284] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,284] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,284] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,286] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 2: [2022-11-28 14:28:35,286] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,276] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,276] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,281] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,281] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,287] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,287] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 4: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 7: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 1: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:28:35,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 6: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:28:35,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:28:35,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:28:35,296] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,296] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:28:35,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 3: [2022-11-28 14:28:35,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 5: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:28:35,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 14:28:35,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: [2022-11-28 14:28:35,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step13000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:28:35,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step13000 is ready now! 0: successfully saved checkpoint at iteration 13000 to checkpoints_221m 7: time (ms) | save-checkpoint: 665.81 7: iteration 13010/ 115203 | consumed samples: 3330560 | consumed tokens: 6820986880 | elapsed time per iteration (s): 0.51 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.507631E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 504.814 | TFLOPs: 26.49 | 7: iteration 13020/ 115203 | consumed samples: 3333120 | consumed tokens: 6826229760 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.485671E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.982 | TFLOPs: 31.22 | 7: iteration 13030/ 115203 | consumed samples: 3335680 | consumed tokens: 6831472640 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.473294E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.109 | TFLOPs: 31.91 | 7: iteration 13040/ 115203 | consumed samples: 3338240 | consumed tokens: 6836715520 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.496759E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.516 | TFLOPs: 31.40 | 7: iteration 13050/ 115203 | consumed samples: 3340800 | consumed tokens: 6841958400 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.454211E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.695 | TFLOPs: 31.73 | 7: iteration 13060/ 115203 | consumed samples: 3343360 | consumed tokens: 6847201280 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.483360E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.749 | TFLOPs: 32.10 | 7: iteration 13070/ 115203 | consumed samples: 3345920 | consumed tokens: 6852444160 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.503123E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.781 | TFLOPs: 31.57 | 7: iteration 13080/ 115203 | consumed samples: 3348480 | consumed tokens: 6857687040 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.515659E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.543 | TFLOPs: 31.56 | 7: iteration 13090/ 115203 | consumed samples: 3351040 | consumed tokens: 6862929920 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.485755E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.116 | TFLOPs: 31.43 | 7: iteration 13100/ 115203 | consumed samples: 3353600 | consumed tokens: 6868172800 | elapsed time per iteration (s): 0.42 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.519498E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.704 | TFLOPs: 32.20 | 7: iteration 13110/ 115203 | consumed samples: 3356160 | consumed tokens: 6873415680 | elapsed time per iteration (s): 0.45 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.485946E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.549 | TFLOPs: 30.15 | 7: iteration 13120/ 115203 | consumed samples: 3358720 | consumed tokens: 6878658560 | elapsed time per iteration (s): 0.43 | learning rate: 1.952E-04 | global batch size: 256 | lm loss: 2.494342E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.960 | TFLOPs: 31.58 | 7: iteration 13130/ 115203 | consumed samples: 3361280 | consumed tokens: 6883901440 | elapsed time per iteration (s): 0.44 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.469922E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.012 | TFLOPs: 30.64 | 7: iteration 13140/ 115203 | consumed samples: 3363840 | consumed tokens: 6889144320 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.447086E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.841 | TFLOPs: 32.05 | 7: iteration 13150/ 115203 | consumed samples: 3366400 | consumed tokens: 6894387200 | elapsed time per iteration (s): 0.44 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.497266E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.962 | TFLOPs: 30.43 | 7: iteration 13160/ 115203 | consumed samples: 3368960 | consumed tokens: 6899630080 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.500568E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.511 | TFLOPs: 31.82 | 7: iteration 13170/ 115203 | consumed samples: 3371520 | consumed tokens: 6904872960 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.490755E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.959 | TFLOPs: 31.79 | 7: iteration 13180/ 115203 | consumed samples: 3374080 | consumed tokens: 6910115840 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.457293E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.102 | TFLOPs: 31.64 | 7: iteration 13190/ 115203 | consumed samples: 3376640 | consumed tokens: 6915358720 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.500452E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.688 | TFLOPs: 31.10 | 7: iteration 13200/ 115203 | consumed samples: 3379200 | consumed tokens: 6920601600 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.487312E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.978 | TFLOPs: 31.79 | 7: iteration 13210/ 115203 | consumed samples: 3381760 | consumed tokens: 6925844480 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.455998E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.249 | TFLOPs: 31.81 | 7: iteration 13220/ 115203 | consumed samples: 3384320 | consumed tokens: 6931087360 | elapsed time per iteration (s): 0.43 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.509258E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.249 | TFLOPs: 31.49 | 7: iteration 13230/ 115203 | consumed samples: 3386880 | consumed tokens: 6936330240 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.498637E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.562 | TFLOPs: 32.14 | 7: iteration 13240/ 115203 | consumed samples: 3389440 | consumed tokens: 6941573120 | elapsed time per iteration (s): 0.42 | learning rate: 1.951E-04 | global batch size: 256 | lm loss: 2.491892E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.463 | TFLOPs: 31.77 | 7: iteration 13250/ 115203 | consumed samples: 3392000 | consumed tokens: 6946816000 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.497168E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.655 | TFLOPs: 31.67 | 7: iteration 13260/ 115203 | consumed samples: 3394560 | consumed tokens: 6952058880 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.506961E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.365 | TFLOPs: 31.76 | 7: iteration 13270/ 115203 | consumed samples: 3397120 | consumed tokens: 6957301760 | elapsed time per iteration (s): 0.44 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.506372E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.709 | TFLOPs: 30.73 | 7: iteration 13280/ 115203 | consumed samples: 3399680 | consumed tokens: 6962544640 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.460888E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.289 | TFLOPs: 31.29 | 7: iteration 13290/ 115203 | consumed samples: 3402240 | consumed tokens: 6967787520 | elapsed time per iteration (s): 0.45 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.510072E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.819 | TFLOPs: 30.11 | 7: iteration 13300/ 115203 | consumed samples: 3404800 | consumed tokens: 6973030400 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.498639E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.668 | TFLOPs: 31.57 | 7: iteration 13310/ 115203 | consumed samples: 3407360 | consumed tokens: 6978273280 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.501107E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.105 | TFLOPs: 31.54 | 7: iteration 13320/ 115203 | consumed samples: 3409920 | consumed tokens: 6983516160 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.484320E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.299 | TFLOPs: 31.81 | 7: iteration 13330/ 115203 | consumed samples: 3412480 | consumed tokens: 6988759040 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.480959E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.266 | TFLOPs: 32.02 | 7: iteration 13340/ 115203 | consumed samples: 3415040 | consumed tokens: 6994001920 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.486462E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.067 | TFLOPs: 31.59 | 7: iteration 13350/ 115203 | consumed samples: 3417600 | consumed tokens: 6999244800 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.480022E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.266 | TFLOPs: 31.70 | 7: iteration 13360/ 115203 | consumed samples: 3420160 | consumed tokens: 7004487680 | elapsed time per iteration (s): 0.42 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.490342E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.444 | TFLOPs: 31.82 | 7: iteration 13370/ 115203 | consumed samples: 3422720 | consumed tokens: 7009730560 | elapsed time per iteration (s): 0.43 | learning rate: 1.950E-04 | global batch size: 256 | lm loss: 2.488741E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.937 | TFLOPs: 31.37 | 7: iteration 13380/ 115203 | consumed samples: 3425280 | consumed tokens: 7014973440 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.482081E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.448 | TFLOPs: 31.98 | 7: iteration 13390/ 115203 | consumed samples: 3427840 | consumed tokens: 7020216320 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.492157E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.023 | TFLOPs: 32.01 | 7: iteration 13400/ 115203 | consumed samples: 3430400 | consumed tokens: 7025459200 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.492308E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.468 | TFLOPs: 31.77 | 7: iteration 13410/ 115203 | consumed samples: 3432960 | consumed tokens: 7030702080 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.517827E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.047 | TFLOPs: 31.69 | 7: iteration 13420/ 115203 | consumed samples: 3435520 | consumed tokens: 7035944960 | elapsed time per iteration (s): 0.44 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.528492E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.147 | TFLOPs: 30.65 | 7: iteration 13430/ 115203 | consumed samples: 3438080 | consumed tokens: 7041187840 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.472379E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.678 | TFLOPs: 31.57 | 7: iteration 13440/ 115203 | consumed samples: 3440640 | consumed tokens: 7046430720 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.483734E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.859 | TFLOPs: 32.00 | 7: iteration 13450/ 115203 | consumed samples: 3443200 | consumed tokens: 7051673600 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.499605E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.685 | TFLOPs: 31.78 | 7: iteration 13460/ 115203 | consumed samples: 3445760 | consumed tokens: 7056916480 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.511012E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.926 | TFLOPs: 32.11 | 7: iteration 13470/ 115203 | consumed samples: 3448320 | consumed tokens: 7062159360 | elapsed time per iteration (s): 0.42 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.458334E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.882 | TFLOPs: 32.16 | 7: iteration 13480/ 115203 | consumed samples: 3450880 | consumed tokens: 7067402240 | elapsed time per iteration (s): 0.44 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.480614E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.316 | TFLOPs: 30.24 | 7: iteration 13490/ 115203 | consumed samples: 3453440 | consumed tokens: 7072645120 | elapsed time per iteration (s): 0.43 | learning rate: 1.949E-04 | global batch size: 256 | lm loss: 2.469556E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.825 | TFLOPs: 31.31 | 7: iteration 13500/ 115203 | consumed samples: 3456000 | consumed tokens: 7077888000 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.463533E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.128 | TFLOPs: 31.91 | 7: iteration 13510/ 115203 | consumed samples: 3458560 | consumed tokens: 7083130880 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.503858E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.730 | TFLOPs: 31.99 | 7: iteration 13520/ 115203 | consumed samples: 3461120 | consumed tokens: 7088373760 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.506854E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.545 | TFLOPs: 31.72 | 7: iteration 13530/ 115203 | consumed samples: 3463680 | consumed tokens: 7093616640 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.472483E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.359 | TFLOPs: 31.81 | 7: iteration 13540/ 115203 | consumed samples: 3466240 | consumed tokens: 7098859520 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.513064E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.559 | TFLOPs: 31.46 | 7: iteration 13550/ 115203 | consumed samples: 3468800 | consumed tokens: 7104102400 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.492521E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.186 | TFLOPs: 31.81 | 7: iteration 13560/ 115203 | consumed samples: 3471360 | consumed tokens: 7109345280 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.479555E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.183 | TFLOPs: 31.70 | 7: iteration 13570/ 115203 | consumed samples: 3473920 | consumed tokens: 7114588160 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.499363E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.922 | TFLOPs: 31.32 | 7: iteration 13580/ 115203 | consumed samples: 3476480 | consumed tokens: 7119831040 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.487106E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.845 | TFLOPs: 31.58 | 7: iteration 13590/ 115203 | consumed samples: 3479040 | consumed tokens: 7125073920 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.473867E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.675 | TFLOPs: 31.41 | 7: iteration 13600/ 115203 | consumed samples: 3481600 | consumed tokens: 7130316800 | elapsed time per iteration (s): 0.43 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.529295E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.740 | TFLOPs: 31.00 | 7: iteration 13610/ 115203 | consumed samples: 3484160 | consumed tokens: 7135559680 | elapsed time per iteration (s): 0.42 | learning rate: 1.948E-04 | global batch size: 256 | lm loss: 2.491650E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.701 | TFLOPs: 31.94 | 7: iteration 13620/ 115203 | consumed samples: 3486720 | consumed tokens: 7140802560 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.470647E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.147 | TFLOPs: 31.70 | 7: iteration 13630/ 115203 | consumed samples: 3489280 | consumed tokens: 7146045440 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.492347E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.510 | TFLOPs: 31.30 | 7: iteration 13640/ 115203 | consumed samples: 3491840 | consumed tokens: 7151288320 | elapsed time per iteration (s): 0.44 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.467219E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.448 | TFLOPs: 30.35 | 7: iteration 13650/ 115203 | consumed samples: 3494400 | consumed tokens: 7156531200 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.503642E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.910 | TFLOPs: 31.58 | 7: iteration 13660/ 115203 | consumed samples: 3496960 | consumed tokens: 7161774080 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.482258E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.899 | TFLOPs: 31.58 | 7: iteration 13670/ 115203 | consumed samples: 3499520 | consumed tokens: 7167016960 | elapsed time per iteration (s): 15.46 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.510090E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 16.563 | TFLOPs: 0.87 | 7: iteration 13680/ 115203 | consumed samples: 3502080 | consumed tokens: 7172259840 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.472819E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.079 | TFLOPs: 31.38 | 7: iteration 13690/ 115203 | consumed samples: 3504640 | consumed tokens: 7177502720 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.499232E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.236 | TFLOPs: 32.07 | 7: iteration 13700/ 115203 | consumed samples: 3507200 | consumed tokens: 7182745600 | elapsed time per iteration (s): 0.43 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.517488E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.910 | TFLOPs: 31.00 | 7: iteration 13710/ 115203 | consumed samples: 3509760 | consumed tokens: 7187988480 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.483758E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.409 | TFLOPs: 31.61 | 7: iteration 13720/ 115203 | consumed samples: 3512320 | consumed tokens: 7193231360 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.460810E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.151 | TFLOPs: 32.28 | 7: iteration 13730/ 115203 | consumed samples: 3514880 | consumed tokens: 7198474240 | elapsed time per iteration (s): 0.42 | learning rate: 1.947E-04 | global batch size: 256 | lm loss: 2.493438E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.252 | TFLOPs: 32.07 | 7: iteration 13740/ 115203 | consumed samples: 3517440 | consumed tokens: 7203717120 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.498642E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.008 | TFLOPs: 31.38 | 7: iteration 13750/ 115203 | consumed samples: 3520000 | consumed tokens: 7208960000 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.494814E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.694 | TFLOPs: 31.88 | 7: iteration 13760/ 115203 | consumed samples: 3522560 | consumed tokens: 7214202880 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.462155E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.416 | TFLOPs: 30.98 | 7: iteration 13770/ 115203 | consumed samples: 3525120 | consumed tokens: 7219445760 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.525970E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.184 | TFLOPs: 31.86 | 7: iteration 13780/ 115203 | consumed samples: 3527680 | consumed tokens: 7224688640 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.466109E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.694 | TFLOPs: 32.04 | 7: iteration 13790/ 115203 | consumed samples: 3530240 | consumed tokens: 7229931520 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.497277E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.602 | TFLOPs: 31.25 | 7: iteration 13800/ 115203 | consumed samples: 3532800 | consumed tokens: 7235174400 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.502978E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.816 | TFLOPs: 31.73 | 7: iteration 13810/ 115203 | consumed samples: 3535360 | consumed tokens: 7240417280 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.465064E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.142 | TFLOPs: 31.86 | 7: iteration 13820/ 115203 | consumed samples: 3537920 | consumed tokens: 7245660160 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.480670E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.135 | TFLOPs: 32.01 | 7: iteration 13830/ 115203 | consumed samples: 3540480 | consumed tokens: 7250903040 | elapsed time per iteration (s): 0.43 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.503981E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.145 | TFLOPs: 31.59 | 7: iteration 13840/ 115203 | consumed samples: 3543040 | consumed tokens: 7256145920 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.471642E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.046 | TFLOPs: 31.64 | 7: iteration 13850/ 115203 | consumed samples: 3545600 | consumed tokens: 7261388800 | elapsed time per iteration (s): 0.42 | learning rate: 1.946E-04 | global batch size: 256 | lm loss: 2.494525E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.132 | TFLOPs: 32.17 | 7: iteration 13860/ 115203 | consumed samples: 3548160 | consumed tokens: 7266631680 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.530229E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.885 | TFLOPs: 31.68 | 7: iteration 13870/ 115203 | consumed samples: 3550720 | consumed tokens: 7271874560 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.495693E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.924 | TFLOPs: 31.48 | 7: iteration 13880/ 115203 | consumed samples: 3553280 | consumed tokens: 7277117440 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.451337E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.250 | TFLOPs: 31.91 | 7: iteration 13890/ 115203 | consumed samples: 3555840 | consumed tokens: 7282360320 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.478893E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.526 | TFLOPs: 31.88 | 7: iteration 13900/ 115203 | consumed samples: 3558400 | consumed tokens: 7287603200 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.461228E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.980 | TFLOPs: 31.64 | 7: iteration 13910/ 115203 | consumed samples: 3560960 | consumed tokens: 7292846080 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.499118E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.895 | TFLOPs: 31.63 | 7: iteration 13920/ 115203 | consumed samples: 3563520 | consumed tokens: 7298088960 | elapsed time per iteration (s): 0.42 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.469115E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.047 | TFLOPs: 31.85 | 7: iteration 13930/ 115203 | consumed samples: 3566080 | consumed tokens: 7303331840 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.481952E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.575 | TFLOPs: 31.56 | 7: iteration 13940/ 115203 | consumed samples: 3568640 | consumed tokens: 7308574720 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.508361E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.384 | TFLOPs: 30.98 | 7: iteration 13950/ 115203 | consumed samples: 3571200 | consumed tokens: 7313817600 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.501128E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.731 | TFLOPs: 31.57 | 7: iteration 13960/ 115203 | consumed samples: 3573760 | consumed tokens: 7319060480 | elapsed time per iteration (s): 0.43 | learning rate: 1.945E-04 | global batch size: 256 | lm loss: 2.486373E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.178 | TFLOPs: 31.54 | 7: iteration 13970/ 115203 | consumed samples: 3576320 | consumed tokens: 7324303360 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.470771E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.084 | TFLOPs: 31.07 | 7: iteration 13980/ 115203 | consumed samples: 3578880 | consumed tokens: 7329546240 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.487789E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.577 | TFLOPs: 31.83 | 7: iteration 13990/ 115203 | consumed samples: 3581440 | consumed tokens: 7334789120 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.452654E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.090 | TFLOPs: 31.80 | 0: [2022-11-28 14:38:11,068] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=0, lr=[0.00019442251142812213, 0.00019442251142812213, 0.00019442251142812213], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 14000/ 115203 | consumed samples: 3584000 | consumed tokens: 7340032000 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.459767E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.955 | TFLOPs: 31.90 | 0: steps: 14000 loss: 2.4798 iter time (s): 0.499 samples/sec: 513.369 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 14000 | lm loss value: 2.397298E+00 | lm loss PPL: 1.099343E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 14000 to checkpoints_221m 0: [2022-11-28 14:38:11,233] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step14000 is begin to save! 0: [2022-11-28 14:38:11,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:38:11,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:38:11,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:38:11,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:38:11,376] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:38:11,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:38:11,399] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:38:11,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:38:11,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:38:11,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:38:11,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:38:11,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:38:11,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:38:11,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:38:11,493] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:38:11,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:38:11,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:38:11,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:38:11,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:38:11,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:38:11,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:38:11,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:38:11,584] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:38:11,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:38:11,608] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:38:11,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:38:11,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:38:11,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:38:11,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:38:11,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:38:11,678] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:38:11,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:38:11,701] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:38:11,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:38:11,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:38:11,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:38:11,748] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:38:11,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:38:11,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:38:11,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:38:11,777] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step14000/mp_rank_00_model_states.pt 0: [2022-11-28 14:38:11,777] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:38:11,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:38:11,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2022-11-28 14:38:11,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 5: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:38:11,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,856] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,856] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2022-11-28 14:38:11,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:38:11,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 4: [2022-11-28 14:38:11,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:38:11,859] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 7: [2022-11-28 14:38:11,859] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:38:11,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 2: [2022-11-28 14:38:11,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:38:11,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:38:11,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 1: [2022-11-28 14:38:11,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:38:11,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,857] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,857] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:38:11,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 3: [2022-11-28 14:38:11,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 6: [2022-11-28 14:38:11,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:38:11,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:38:11,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:38:11,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: [2022-11-28 14:38:11,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step14000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:38:11,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step14000 is ready now! 0: successfully saved checkpoint at iteration 14000 to checkpoints_221m 7: time (ms) | save-checkpoint: 680.41 7: iteration 14010/ 115203 | consumed samples: 3586560 | consumed tokens: 7345274880 | elapsed time per iteration (s): 0.51 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.520851E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 500.526 | TFLOPs: 26.26 | 7: iteration 14020/ 115203 | consumed samples: 3589120 | consumed tokens: 7350517760 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.493951E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.128 | TFLOPs: 32.22 | 7: iteration 14030/ 115203 | consumed samples: 3591680 | consumed tokens: 7355760640 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.502570E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.621 | TFLOPs: 31.88 | 7: iteration 14040/ 115203 | consumed samples: 3594240 | consumed tokens: 7361003520 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.472088E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.061 | TFLOPs: 32.17 | 7: iteration 14050/ 115203 | consumed samples: 3596800 | consumed tokens: 7366246400 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.459593E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.795 | TFLOPs: 31.31 | 7: iteration 14060/ 115203 | consumed samples: 3599360 | consumed tokens: 7371489280 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.466797E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.993 | TFLOPs: 31.95 | 7: iteration 14070/ 115203 | consumed samples: 3601920 | consumed tokens: 7376732160 | elapsed time per iteration (s): 0.42 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.446184E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.900 | TFLOPs: 32.16 | 7: iteration 14080/ 115203 | consumed samples: 3604480 | consumed tokens: 7381975040 | elapsed time per iteration (s): 0.43 | learning rate: 1.944E-04 | global batch size: 256 | lm loss: 2.495049E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.776 | TFLOPs: 31.15 | 7: iteration 14090/ 115203 | consumed samples: 3607040 | consumed tokens: 7387217920 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.454653E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.481 | TFLOPs: 31.77 | 7: iteration 14100/ 115203 | consumed samples: 3609600 | consumed tokens: 7392460800 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.486872E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.693 | TFLOPs: 31.62 | 7: iteration 14110/ 115203 | consumed samples: 3612160 | consumed tokens: 7397703680 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.488710E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.839 | TFLOPs: 32.15 | 7: iteration 14120/ 115203 | consumed samples: 3614720 | consumed tokens: 7402946560 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.489949E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.558 | TFLOPs: 31.98 | 7: iteration 14130/ 115203 | consumed samples: 3617280 | consumed tokens: 7408189440 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.506763E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.368 | TFLOPs: 31.08 | 7: iteration 14140/ 115203 | consumed samples: 3619840 | consumed tokens: 7413432320 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.462218E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.736 | TFLOPs: 31.94 | 7: iteration 14150/ 115203 | consumed samples: 3622400 | consumed tokens: 7418675200 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.501653E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.479 | TFLOPs: 31.51 | 7: iteration 14160/ 115203 | consumed samples: 3624960 | consumed tokens: 7423918080 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.513738E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.013 | TFLOPs: 31.32 | 7: iteration 14170/ 115203 | consumed samples: 3627520 | consumed tokens: 7429160960 | elapsed time per iteration (s): 0.43 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.473439E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.258 | TFLOPs: 31.07 | 7: iteration 14180/ 115203 | consumed samples: 3630080 | consumed tokens: 7434403840 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.519441E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.950 | TFLOPs: 32.21 | 7: iteration 14190/ 115203 | consumed samples: 3632640 | consumed tokens: 7439646720 | elapsed time per iteration (s): 0.42 | learning rate: 1.943E-04 | global batch size: 256 | lm loss: 2.478145E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.980 | TFLOPs: 31.79 | 7: iteration 14200/ 115203 | consumed samples: 3635200 | consumed tokens: 7444889600 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.485060E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.976 | TFLOPs: 31.58 | 7: iteration 14210/ 115203 | consumed samples: 3637760 | consumed tokens: 7450132480 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.460322E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.963 | TFLOPs: 31.27 | 7: iteration 14220/ 115203 | consumed samples: 3640320 | consumed tokens: 7455375360 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.493881E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.776 | TFLOPs: 31.99 | 7: iteration 14230/ 115203 | consumed samples: 3642880 | consumed tokens: 7460618240 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.472915E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.686 | TFLOPs: 31.67 | 7: iteration 14240/ 115203 | consumed samples: 3645440 | consumed tokens: 7465861120 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.494372E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.137 | TFLOPs: 31.59 | 7: iteration 14250/ 115203 | consumed samples: 3648000 | consumed tokens: 7471104000 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.492300E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.232 | TFLOPs: 31.81 | 7: iteration 14260/ 115203 | consumed samples: 3650560 | consumed tokens: 7476346880 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.506976E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.005 | TFLOPs: 31.38 | 7: iteration 14270/ 115203 | consumed samples: 3653120 | consumed tokens: 7481589760 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.460095E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.460 | TFLOPs: 32.03 | 7: iteration 14280/ 115203 | consumed samples: 3655680 | consumed tokens: 7486832640 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.476401E+00 | grad norm: 0.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.999 | TFLOPs: 31.53 | 7: iteration 14290/ 115203 | consumed samples: 3658240 | consumed tokens: 7492075520 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.494650E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.464 | TFLOPs: 31.56 | 7: iteration 14300/ 115203 | consumed samples: 3660800 | consumed tokens: 7497318400 | elapsed time per iteration (s): 0.43 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.434136E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.000 | TFLOPs: 31.27 | 7: iteration 14310/ 115203 | consumed samples: 3663360 | consumed tokens: 7502561280 | elapsed time per iteration (s): 0.42 | learning rate: 1.942E-04 | global batch size: 256 | lm loss: 2.440562E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.245 | TFLOPs: 31.70 | 7: iteration 14320/ 115203 | consumed samples: 3665920 | consumed tokens: 7507804160 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.445496E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.764 | TFLOPs: 31.94 | 7: iteration 14330/ 115203 | consumed samples: 3668480 | consumed tokens: 7513047040 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.458712E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.985 | TFLOPs: 31.43 | 7: iteration 14340/ 115203 | consumed samples: 3671040 | consumed tokens: 7518289920 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.479339E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.852 | TFLOPs: 32.21 | 7: iteration 14350/ 115203 | consumed samples: 3673600 | consumed tokens: 7523532800 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.505499E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.859 | TFLOPs: 31.89 | 7: iteration 14360/ 115203 | consumed samples: 3676160 | consumed tokens: 7528775680 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.509353E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.513 | TFLOPs: 31.51 | 7: iteration 14370/ 115203 | consumed samples: 3678720 | consumed tokens: 7534018560 | elapsed time per iteration (s): 0.42 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.508682E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.559 | TFLOPs: 31.67 | 7: iteration 14380/ 115203 | consumed samples: 3681280 | consumed tokens: 7539261440 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.461701E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.100 | TFLOPs: 31.49 | 7: iteration 14390/ 115203 | consumed samples: 3683840 | consumed tokens: 7544504320 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.488884E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.521 | TFLOPs: 31.51 | 7: iteration 14400/ 115203 | consumed samples: 3686400 | consumed tokens: 7549747200 | elapsed time per iteration (s): 0.44 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.437883E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.495 | TFLOPs: 30.62 | 7: iteration 14410/ 115203 | consumed samples: 3688960 | consumed tokens: 7554990080 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.439738E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.426 | TFLOPs: 31.45 | 7: iteration 14420/ 115203 | consumed samples: 3691520 | consumed tokens: 7560232960 | elapsed time per iteration (s): 0.43 | learning rate: 1.941E-04 | global batch size: 256 | lm loss: 2.471538E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.597 | TFLOPs: 31.51 | 7: iteration 14430/ 115203 | consumed samples: 3694080 | consumed tokens: 7565475840 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.481506E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.069 | TFLOPs: 32.06 | 7: iteration 14440/ 115203 | consumed samples: 3696640 | consumed tokens: 7570718720 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.445314E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.279 | TFLOPs: 31.34 | 7: iteration 14450/ 115203 | consumed samples: 3699200 | consumed tokens: 7575961600 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.461008E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.803 | TFLOPs: 31.84 | 7: iteration 14460/ 115203 | consumed samples: 3701760 | consumed tokens: 7581204480 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.489682E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.478 | TFLOPs: 32.03 | 7: iteration 14470/ 115203 | consumed samples: 3704320 | consumed tokens: 7586447360 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.493847E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.049 | TFLOPs: 32.06 | 7: iteration 14480/ 115203 | consumed samples: 3706880 | consumed tokens: 7591690240 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.458243E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.562 | TFLOPs: 31.98 | 7: iteration 14490/ 115203 | consumed samples: 3709440 | consumed tokens: 7596933120 | elapsed time per iteration (s): 0.45 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.454911E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.800 | TFLOPs: 29.90 | 7: iteration 14500/ 115203 | consumed samples: 3712000 | consumed tokens: 7602176000 | elapsed time per iteration (s): 0.42 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.473100E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.057 | TFLOPs: 31.75 | 7: iteration 14510/ 115203 | consumed samples: 3714560 | consumed tokens: 7607418880 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.489458E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.622 | TFLOPs: 31.51 | 7: iteration 14520/ 115203 | consumed samples: 3717120 | consumed tokens: 7612661760 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.528216E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.044 | TFLOPs: 31.33 | 7: iteration 14530/ 115203 | consumed samples: 3719680 | consumed tokens: 7617904640 | elapsed time per iteration (s): 0.43 | learning rate: 1.940E-04 | global batch size: 256 | lm loss: 2.467731E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.806 | TFLOPs: 31.10 | 7: iteration 14540/ 115203 | consumed samples: 3722240 | consumed tokens: 7623147520 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.482071E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.683 | TFLOPs: 31.67 | 7: iteration 14550/ 115203 | consumed samples: 3724800 | consumed tokens: 7628390400 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.448533E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.382 | TFLOPs: 31.66 | 7: iteration 14560/ 115203 | consumed samples: 3727360 | consumed tokens: 7633633280 | elapsed time per iteration (s): 0.44 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.464841E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.503 | TFLOPs: 30.88 | 7: iteration 14570/ 115203 | consumed samples: 3729920 | consumed tokens: 7638876160 | elapsed time per iteration (s): 0.44 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.467810E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.774 | TFLOPs: 30.42 | 7: iteration 14580/ 115203 | consumed samples: 3732480 | consumed tokens: 7644119040 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.444833E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.447 | TFLOPs: 31.98 | 7: iteration 14590/ 115203 | consumed samples: 3735040 | consumed tokens: 7649361920 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.486394E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.195 | TFLOPs: 32.07 | 7: iteration 14600/ 115203 | consumed samples: 3737600 | consumed tokens: 7654604800 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.496318E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.696 | TFLOPs: 32.04 | 7: iteration 14610/ 115203 | consumed samples: 3740160 | consumed tokens: 7659847680 | elapsed time per iteration (s): 0.43 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.479579E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.576 | TFLOPs: 31.56 | 7: iteration 14620/ 115203 | consumed samples: 3742720 | consumed tokens: 7665090560 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.486695E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.431 | TFLOPs: 32.08 | 7: iteration 14630/ 115203 | consumed samples: 3745280 | consumed tokens: 7670333440 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.449015E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.458 | TFLOPs: 31.82 | 7: iteration 14640/ 115203 | consumed samples: 3747840 | consumed tokens: 7675576320 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.459073E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.185 | TFLOPs: 32.23 | 7: iteration 14650/ 115203 | consumed samples: 3750400 | consumed tokens: 7680819200 | elapsed time per iteration (s): 0.42 | learning rate: 1.939E-04 | global batch size: 256 | lm loss: 2.479846E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.570 | TFLOPs: 31.88 | 7: iteration 14660/ 115203 | consumed samples: 3752960 | consumed tokens: 7686062080 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.453571E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.508 | TFLOPs: 31.25 | 7: iteration 14670/ 115203 | consumed samples: 3755520 | consumed tokens: 7691304960 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.463668E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.187 | TFLOPs: 31.96 | 7: iteration 14680/ 115203 | consumed samples: 3758080 | consumed tokens: 7696547840 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.488785E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.166 | TFLOPs: 31.54 | 7: iteration 14690/ 115203 | consumed samples: 3760640 | consumed tokens: 7701790720 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.457297E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.511 | TFLOPs: 31.40 | 7: iteration 14700/ 115203 | consumed samples: 3763200 | consumed tokens: 7707033600 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.490910E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.431 | TFLOPs: 31.29 | 7: iteration 14710/ 115203 | consumed samples: 3765760 | consumed tokens: 7712276480 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.458949E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.217 | TFLOPs: 31.13 | 7: iteration 14720/ 115203 | consumed samples: 3768320 | consumed tokens: 7717519360 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.477050E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.681 | TFLOPs: 31.62 | 7: iteration 14730/ 115203 | consumed samples: 3770880 | consumed tokens: 7722762240 | elapsed time per iteration (s): 0.43 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.477710E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.832 | TFLOPs: 31.10 | 7: iteration 14740/ 115203 | consumed samples: 3773440 | consumed tokens: 7728005120 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.475260E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.759 | TFLOPs: 31.78 | 7: iteration 14750/ 115203 | consumed samples: 3776000 | consumed tokens: 7733248000 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.477725E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.106 | TFLOPs: 31.91 | 7: iteration 14760/ 115203 | consumed samples: 3778560 | consumed tokens: 7738490880 | elapsed time per iteration (s): 0.42 | learning rate: 1.938E-04 | global batch size: 256 | lm loss: 2.453460E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.532 | TFLOPs: 31.61 | 7: iteration 14770/ 115203 | consumed samples: 3781120 | consumed tokens: 7743733760 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.461469E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.208 | TFLOPs: 31.75 | 7: iteration 14780/ 115203 | consumed samples: 3783680 | consumed tokens: 7748976640 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.497573E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.233 | TFLOPs: 31.81 | 7: iteration 14790/ 115203 | consumed samples: 3786240 | consumed tokens: 7754219520 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.487687E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.996 | TFLOPs: 31.85 | 7: iteration 14800/ 115203 | consumed samples: 3788800 | consumed tokens: 7759462400 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.459695E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.072 | TFLOPs: 31.27 | 7: iteration 14810/ 115203 | consumed samples: 3791360 | consumed tokens: 7764705280 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.502496E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.705 | TFLOPs: 31.99 | 7: iteration 14820/ 115203 | consumed samples: 3793920 | consumed tokens: 7769948160 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.446678E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.044 | TFLOPs: 31.75 | 7: iteration 14830/ 115203 | consumed samples: 3796480 | consumed tokens: 7775191040 | elapsed time per iteration (s): 0.44 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.475463E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.712 | TFLOPs: 30.78 | 7: iteration 14840/ 115203 | consumed samples: 3799040 | consumed tokens: 7780433920 | elapsed time per iteration (s): 0.43 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.457001E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.844 | TFLOPs: 31.42 | 7: iteration 14850/ 115203 | consumed samples: 3801600 | consumed tokens: 7785676800 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.457657E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.669 | TFLOPs: 31.73 | 7: iteration 14860/ 115203 | consumed samples: 3804160 | consumed tokens: 7790919680 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.449038E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.499 | TFLOPs: 32.24 | 7: iteration 14870/ 115203 | consumed samples: 3806720 | consumed tokens: 7796162560 | elapsed time per iteration (s): 0.42 | learning rate: 1.937E-04 | global batch size: 256 | lm loss: 2.491770E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.737 | TFLOPs: 32.04 | 7: iteration 14880/ 115203 | consumed samples: 3809280 | consumed tokens: 7801405440 | elapsed time per iteration (s): 0.43 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.462312E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.837 | TFLOPs: 31.58 | 7: iteration 14890/ 115203 | consumed samples: 3811840 | consumed tokens: 7806648320 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.444299E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.797 | TFLOPs: 32.26 | 7: iteration 14900/ 115203 | consumed samples: 3814400 | consumed tokens: 7811891200 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.447099E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.095 | TFLOPs: 31.91 | 7: iteration 14910/ 115203 | consumed samples: 3816960 | consumed tokens: 7817134080 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.487186E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.242 | TFLOPs: 31.86 | 7: iteration 14920/ 115203 | consumed samples: 3819520 | consumed tokens: 7822376960 | elapsed time per iteration (s): 0.45 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.446043E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.155 | TFLOPs: 30.07 | 7: iteration 14930/ 115203 | consumed samples: 3822080 | consumed tokens: 7827619840 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.451867E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.689 | TFLOPs: 31.99 | 7: iteration 14940/ 115203 | consumed samples: 3824640 | consumed tokens: 7832862720 | elapsed time per iteration (s): 0.44 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.474319E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.761 | TFLOPs: 30.84 | 7: iteration 14950/ 115203 | consumed samples: 3827200 | consumed tokens: 7838105600 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.445566E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.046 | TFLOPs: 31.80 | 7: iteration 14960/ 115203 | consumed samples: 3829760 | consumed tokens: 7843348480 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.420468E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.905 | TFLOPs: 31.84 | 7: iteration 14970/ 115203 | consumed samples: 3832320 | consumed tokens: 7848591360 | elapsed time per iteration (s): 0.42 | learning rate: 1.936E-04 | global batch size: 256 | lm loss: 2.459509E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.227 | TFLOPs: 32.02 | 7: iteration 14980/ 115203 | consumed samples: 3834880 | consumed tokens: 7853834240 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.468074E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.372 | TFLOPs: 32.08 | 7: iteration 14990/ 115203 | consumed samples: 3837440 | consumed tokens: 7859077120 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.514610E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.632 | TFLOPs: 32.09 | 7: iteration 15000/ 115203 | consumed samples: 3840000 | consumed tokens: 7864320000 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.455428E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.601 | TFLOPs: 31.72 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 15000 | lm loss value: 2.450297E+00 | lm loss PPL: 1.159179E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 15000 to checkpoints_221m 0: [2022-11-28 14:45:16,696] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step15000 is begin to save! 0: [2022-11-28 14:45:16,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:45:16,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:45:16,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:45:16,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:45:16,876] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:45:16,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:45:16,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:45:16,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:45:16,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:45:16,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:45:16,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:45:17,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:45:17,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:45:17,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:45:17,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:45:17,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:45:17,065] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:45:17,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:45:17,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:45:17,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:45:17,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:45:17,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:45:17,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:45:17,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:45:17,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:45:17,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:45:17,222] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:45:17,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:45:17,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:45:17,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:45:17,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:45:17,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:45:17,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:45:17,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:45:17,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:45:17,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:45:17,384] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:45:17,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:45:17,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:45:17,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:45:17,418] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step15000/mp_rank_00_model_states.pt 0: [2022-11-28 14:45:17,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:45:17,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:45:17,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:45:17,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 6: [2022-11-28 14:45:17,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:45:17,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:45:17,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 7: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:45:17,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 2: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:45:17,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 1: [2022-11-28 14:45:17,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:45:17,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2022-11-28 14:45:17,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:45:17,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 4: [2022-11-28 14:45:17,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 5: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:45:17,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 14:45:17,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:45:17,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 3: [2022-11-28 14:45:17,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:45:17,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: [2022-11-28 14:45:17,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step15000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:45:17,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step15000 is ready now! 0: successfully saved checkpoint at iteration 15000 to checkpoints_221m 7: time (ms) | save-checkpoint: 855.66 7: iteration 15010/ 115203 | consumed samples: 3842560 | consumed tokens: 7869562880 | elapsed time per iteration (s): 0.52 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.473102E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 488.218 | TFLOPs: 25.62 | 7: iteration 15020/ 115203 | consumed samples: 3845120 | consumed tokens: 7874805760 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.453023E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.988 | TFLOPs: 32.27 | 7: iteration 15030/ 115203 | consumed samples: 3847680 | consumed tokens: 7880048640 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.483072E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.252 | TFLOPs: 31.70 | 7: iteration 15040/ 115203 | consumed samples: 3850240 | consumed tokens: 7885291520 | elapsed time per iteration (s): 0.44 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.509456E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.633 | TFLOPs: 30.52 | 7: iteration 15050/ 115203 | consumed samples: 3852800 | consumed tokens: 7890534400 | elapsed time per iteration (s): 0.43 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.448560E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.981 | TFLOPs: 31.58 | 7: iteration 15060/ 115203 | consumed samples: 3855360 | consumed tokens: 7895777280 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.446452E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.089 | TFLOPs: 32.01 | 7: iteration 15070/ 115203 | consumed samples: 3857920 | consumed tokens: 7901020160 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.462114E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.657 | TFLOPs: 31.67 | 7: iteration 15080/ 115203 | consumed samples: 3860480 | consumed tokens: 7906263040 | elapsed time per iteration (s): 0.42 | learning rate: 1.935E-04 | global batch size: 256 | lm loss: 2.449648E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.784 | TFLOPs: 31.84 | 7: iteration 15090/ 115203 | consumed samples: 3863040 | consumed tokens: 7911505920 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.453645E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.467 | TFLOPs: 31.98 | 7: iteration 15100/ 115203 | consumed samples: 3865600 | consumed tokens: 7916748800 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.449054E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.617 | TFLOPs: 31.46 | 7: iteration 15110/ 115203 | consumed samples: 3868160 | consumed tokens: 7921991680 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.476294E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.899 | TFLOPs: 32.11 | 7: iteration 15120/ 115203 | consumed samples: 3870720 | consumed tokens: 7927234560 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.461911E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.037 | TFLOPs: 32.06 | 7: iteration 15130/ 115203 | consumed samples: 3873280 | consumed tokens: 7932477440 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.444590E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.267 | TFLOPs: 31.60 | 7: iteration 15140/ 115203 | consumed samples: 3875840 | consumed tokens: 7937720320 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.488771E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.092 | TFLOPs: 31.96 | 7: iteration 15150/ 115203 | consumed samples: 3878400 | consumed tokens: 7942963200 | elapsed time per iteration (s): 0.43 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.442682E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.391 | TFLOPs: 31.50 | 7: iteration 15160/ 115203 | consumed samples: 3880960 | consumed tokens: 7948206080 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.458456E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.678 | TFLOPs: 31.94 | 7: iteration 15170/ 115203 | consumed samples: 3883520 | consumed tokens: 7953448960 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.480095E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.575 | TFLOPs: 31.67 | 7: iteration 15180/ 115203 | consumed samples: 3886080 | consumed tokens: 7958691840 | elapsed time per iteration (s): 0.42 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.453157E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.091 | TFLOPs: 31.75 | 7: iteration 15190/ 115203 | consumed samples: 3888640 | consumed tokens: 7963934720 | elapsed time per iteration (s): 0.44 | learning rate: 1.934E-04 | global batch size: 256 | lm loss: 2.456860E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.094 | TFLOPs: 30.70 | 7: iteration 15200/ 115203 | consumed samples: 3891200 | consumed tokens: 7969177600 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.481069E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.887 | TFLOPs: 32.00 | 7: iteration 15210/ 115203 | consumed samples: 3893760 | consumed tokens: 7974420480 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.456946E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.226 | TFLOPs: 32.17 | 7: iteration 15220/ 115203 | consumed samples: 3896320 | consumed tokens: 7979663360 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.484243E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.174 | TFLOPs: 31.80 | 7: iteration 15230/ 115203 | consumed samples: 3898880 | consumed tokens: 7984906240 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.463325E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.140 | TFLOPs: 32.28 | 7: iteration 15240/ 115203 | consumed samples: 3901440 | consumed tokens: 7990149120 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.471609E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.811 | TFLOPs: 32.05 | 7: iteration 15250/ 115203 | consumed samples: 3904000 | consumed tokens: 7995392000 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.454831E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.170 | TFLOPs: 32.12 | 7: iteration 15260/ 115203 | consumed samples: 3906560 | consumed tokens: 8000634880 | elapsed time per iteration (s): 0.43 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.481598E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.249 | TFLOPs: 31.44 | 7: iteration 15270/ 115203 | consumed samples: 3909120 | consumed tokens: 8005877760 | elapsed time per iteration (s): 0.45 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.456322E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.422 | TFLOPs: 30.03 | 7: iteration 15280/ 115203 | consumed samples: 3911680 | consumed tokens: 8011120640 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.474392E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.712 | TFLOPs: 32.04 | 7: iteration 15290/ 115203 | consumed samples: 3914240 | consumed tokens: 8016363520 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.450535E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.930 | TFLOPs: 31.69 | 7: iteration 15300/ 115203 | consumed samples: 3916800 | consumed tokens: 8021606400 | elapsed time per iteration (s): 0.42 | learning rate: 1.933E-04 | global batch size: 256 | lm loss: 2.451616E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.361 | TFLOPs: 31.87 | 7: iteration 15310/ 115203 | consumed samples: 3919360 | consumed tokens: 8026849280 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.471659E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.695 | TFLOPs: 31.67 | 7: iteration 15320/ 115203 | consumed samples: 3921920 | consumed tokens: 8032092160 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.461531E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.793 | TFLOPs: 31.31 | 7: iteration 15330/ 115203 | consumed samples: 3924480 | consumed tokens: 8037335040 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.445931E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.932 | TFLOPs: 32.00 | 7: iteration 15340/ 115203 | consumed samples: 3927040 | consumed tokens: 8042577920 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.476400E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.031 | TFLOPs: 31.80 | 7: iteration 15350/ 115203 | consumed samples: 3929600 | consumed tokens: 8047820800 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.461365E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.629 | TFLOPs: 31.41 | 7: iteration 15360/ 115203 | consumed samples: 3932160 | consumed tokens: 8053063680 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.465541E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.922 | TFLOPs: 31.37 | 7: iteration 15370/ 115203 | consumed samples: 3934720 | consumed tokens: 8058306560 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.473964E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.609 | TFLOPs: 31.99 | 7: iteration 15380/ 115203 | consumed samples: 3937280 | consumed tokens: 8063549440 | elapsed time per iteration (s): 0.42 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.437696E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.059 | TFLOPs: 31.90 | 7: iteration 15390/ 115203 | consumed samples: 3939840 | consumed tokens: 8068792320 | elapsed time per iteration (s): 0.44 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.478473E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.848 | TFLOPs: 30.53 | 7: iteration 15400/ 115203 | consumed samples: 3942400 | consumed tokens: 8074035200 | elapsed time per iteration (s): 0.43 | learning rate: 1.932E-04 | global batch size: 256 | lm loss: 2.447545E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.292 | TFLOPs: 31.34 | 7: iteration 15410/ 115203 | consumed samples: 3944960 | consumed tokens: 8079278080 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.492890E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.725 | TFLOPs: 31.78 | 7: iteration 15420/ 115203 | consumed samples: 3947520 | consumed tokens: 8084520960 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.482693E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.465 | TFLOPs: 31.93 | 7: iteration 15430/ 115203 | consumed samples: 3950080 | consumed tokens: 8089763840 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.457881E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.259 | TFLOPs: 31.70 | 7: iteration 15440/ 115203 | consumed samples: 3952640 | consumed tokens: 8095006720 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.456067E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.327 | TFLOPs: 31.87 | 7: iteration 15450/ 115203 | consumed samples: 3955200 | consumed tokens: 8100249600 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.469845E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.954 | TFLOPs: 31.11 | 7: iteration 15460/ 115203 | consumed samples: 3957760 | consumed tokens: 8105492480 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.468451E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.260 | TFLOPs: 31.49 | 7: iteration 15470/ 115203 | consumed samples: 3960320 | consumed tokens: 8110735360 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.468196E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.605 | TFLOPs: 31.72 | 7: iteration 15480/ 115203 | consumed samples: 3962880 | consumed tokens: 8115978240 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.470771E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.976 | TFLOPs: 31.53 | 7: iteration 15490/ 115203 | consumed samples: 3965440 | consumed tokens: 8121221120 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.461091E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.296 | TFLOPs: 32.13 | 7: iteration 15500/ 115203 | consumed samples: 3968000 | consumed tokens: 8126464000 | elapsed time per iteration (s): 0.42 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.464575E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.306 | TFLOPs: 32.07 | 7: iteration 15510/ 115203 | consumed samples: 3970560 | consumed tokens: 8131706880 | elapsed time per iteration (s): 0.43 | learning rate: 1.931E-04 | global batch size: 256 | lm loss: 2.473762E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.428 | TFLOPs: 31.35 | 7: iteration 15520/ 115203 | consumed samples: 3973120 | consumed tokens: 8136949760 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.464838E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.392 | TFLOPs: 32.13 | 7: iteration 15530/ 115203 | consumed samples: 3975680 | consumed tokens: 8142192640 | elapsed time per iteration (s): 0.43 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.485762E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.154 | TFLOPs: 31.59 | 7: iteration 15540/ 115203 | consumed samples: 3978240 | consumed tokens: 8147435520 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.479549E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.190 | TFLOPs: 31.70 | 7: iteration 15550/ 115203 | consumed samples: 3980800 | consumed tokens: 8152678400 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.443288E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.344 | TFLOPs: 32.29 | 7: iteration 15560/ 115203 | consumed samples: 3983360 | consumed tokens: 8157921280 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.468838E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.365 | TFLOPs: 32.13 | 7: iteration 15570/ 115203 | consumed samples: 3985920 | consumed tokens: 8163164160 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.456005E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.108 | TFLOPs: 32.06 | 7: iteration 15580/ 115203 | consumed samples: 3988480 | consumed tokens: 8168407040 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.461002E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.433 | TFLOPs: 32.08 | 7: iteration 15590/ 115203 | consumed samples: 3991040 | consumed tokens: 8173649920 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.487170E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.345 | TFLOPs: 31.66 | 7: iteration 15600/ 115203 | consumed samples: 3993600 | consumed tokens: 8178892800 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.465983E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.032 | TFLOPs: 31.90 | 7: iteration 15610/ 115203 | consumed samples: 3996160 | consumed tokens: 8184135680 | elapsed time per iteration (s): 0.42 | learning rate: 1.930E-04 | global batch size: 256 | lm loss: 2.480518E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.182 | TFLOPs: 31.81 | 7: iteration 15620/ 115203 | consumed samples: 3998720 | consumed tokens: 8189378560 | elapsed time per iteration (s): 0.43 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.442589E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.221 | TFLOPs: 31.49 | 7: iteration 15630/ 115203 | consumed samples: 4001280 | consumed tokens: 8194621440 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.477688E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.054 | TFLOPs: 32.11 | 7: iteration 15640/ 115203 | consumed samples: 4003840 | consumed tokens: 8199864320 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.467712E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.709 | TFLOPs: 32.15 | 7: iteration 15650/ 115203 | consumed samples: 4006400 | consumed tokens: 8205107200 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.499421E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.383 | TFLOPs: 32.24 | 7: iteration 15660/ 115203 | consumed samples: 4008960 | consumed tokens: 8210350080 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.433262E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.838 | TFLOPs: 32.00 | 7: iteration 15670/ 115203 | consumed samples: 4011520 | consumed tokens: 8215592960 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.461119E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.418 | TFLOPs: 32.24 | 7: iteration 15680/ 115203 | consumed samples: 4014080 | consumed tokens: 8220835840 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.447396E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.286 | TFLOPs: 32.02 | 7: iteration 15690/ 115203 | consumed samples: 4016640 | consumed tokens: 8226078720 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.488915E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.334 | TFLOPs: 31.81 | 7: iteration 15700/ 115203 | consumed samples: 4019200 | consumed tokens: 8231321600 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.468625E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.496 | TFLOPs: 31.82 | 7: iteration 15710/ 115203 | consumed samples: 4021760 | consumed tokens: 8236564480 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.480573E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.088 | TFLOPs: 32.27 | 7: iteration 15720/ 115203 | consumed samples: 4024320 | consumed tokens: 8241807360 | elapsed time per iteration (s): 0.42 | learning rate: 1.929E-04 | global batch size: 256 | lm loss: 2.486281E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.613 | TFLOPs: 31.88 | 7: iteration 15730/ 115203 | consumed samples: 4026880 | consumed tokens: 8247050240 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.473420E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.670 | TFLOPs: 31.46 | 7: iteration 15740/ 115203 | consumed samples: 4029440 | consumed tokens: 8252293120 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.468149E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.716 | TFLOPs: 31.36 | 7: iteration 15750/ 115203 | consumed samples: 4032000 | consumed tokens: 8257536000 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.459360E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.416 | TFLOPs: 31.87 | 7: iteration 15760/ 115203 | consumed samples: 4034560 | consumed tokens: 8262778880 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.470927E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.593 | TFLOPs: 31.83 | 7: iteration 15770/ 115203 | consumed samples: 4037120 | consumed tokens: 8268021760 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.471261E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.063 | TFLOPs: 31.59 | 7: iteration 15780/ 115203 | consumed samples: 4039680 | consumed tokens: 8273264640 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.448995E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.801 | TFLOPs: 32.15 | 7: iteration 15790/ 115203 | consumed samples: 4042240 | consumed tokens: 8278507520 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.444844E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.621 | TFLOPs: 31.83 | 7: iteration 15800/ 115203 | consumed samples: 4044800 | consumed tokens: 8283750400 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.441989E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.653 | TFLOPs: 31.15 | 7: iteration 15810/ 115203 | consumed samples: 4047360 | consumed tokens: 8288993280 | elapsed time per iteration (s): 0.42 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.425174E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.357 | TFLOPs: 31.66 | 7: iteration 15820/ 115203 | consumed samples: 4049920 | consumed tokens: 8294236160 | elapsed time per iteration (s): 0.43 | learning rate: 1.928E-04 | global batch size: 256 | lm loss: 2.486860E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.632 | TFLOPs: 31.15 | 7: iteration 15830/ 115203 | consumed samples: 4052480 | consumed tokens: 8299479040 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.464608E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.277 | TFLOPs: 32.07 | 7: iteration 15840/ 115203 | consumed samples: 4055040 | consumed tokens: 8304721920 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.476983E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.898 | TFLOPs: 32.11 | 7: iteration 15850/ 115203 | consumed samples: 4057600 | consumed tokens: 8309964800 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.474196E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.077 | TFLOPs: 32.11 | 7: iteration 15860/ 115203 | consumed samples: 4060160 | consumed tokens: 8315207680 | elapsed time per iteration (s): 0.43 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.457480E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.843 | TFLOPs: 31.53 | 7: iteration 15870/ 115203 | consumed samples: 4062720 | consumed tokens: 8320450560 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.442659E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.005 | TFLOPs: 32.01 | 7: iteration 15880/ 115203 | consumed samples: 4065280 | consumed tokens: 8325693440 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.450537E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.676 | TFLOPs: 32.15 | 7: iteration 15890/ 115203 | consumed samples: 4067840 | consumed tokens: 8330936320 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.461301E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.147 | TFLOPs: 31.96 | 7: iteration 15900/ 115203 | consumed samples: 4070400 | consumed tokens: 8336179200 | elapsed time per iteration (s): 0.44 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.470664E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.708 | TFLOPs: 30.63 | 7: iteration 15910/ 115203 | consumed samples: 4072960 | consumed tokens: 8341422080 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.470484E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.125 | TFLOPs: 32.17 | 7: iteration 15920/ 115203 | consumed samples: 4075520 | consumed tokens: 8346664960 | elapsed time per iteration (s): 0.42 | learning rate: 1.927E-04 | global batch size: 256 | lm loss: 2.422930E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.103 | TFLOPs: 32.01 | 7: iteration 15930/ 115203 | consumed samples: 4078080 | consumed tokens: 8351907840 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.449846E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.289 | TFLOPs: 31.65 | 7: iteration 15940/ 115203 | consumed samples: 4080640 | consumed tokens: 8357150720 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.457188E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.821 | TFLOPs: 31.58 | 7: iteration 15950/ 115203 | consumed samples: 4083200 | consumed tokens: 8362393600 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.469624E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.901 | TFLOPs: 31.74 | 7: iteration 15960/ 115203 | consumed samples: 4085760 | consumed tokens: 8367636480 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.445188E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.022 | TFLOPs: 31.69 | 7: iteration 15970/ 115203 | consumed samples: 4088320 | consumed tokens: 8372879360 | elapsed time per iteration (s): 0.43 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.463957E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.387 | TFLOPs: 31.29 | 7: iteration 15980/ 115203 | consumed samples: 4090880 | consumed tokens: 8378122240 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.484264E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.374 | TFLOPs: 32.13 | 7: iteration 15990/ 115203 | consumed samples: 4093440 | consumed tokens: 8383365120 | elapsed time per iteration (s): 0.44 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.473909E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.870 | TFLOPs: 30.74 | 0: [2022-11-28 14:52:20,761] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=0, lr=[0.00019257700559212364, 0.00019257700559212364, 0.00019257700559212364], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 16000/ 115203 | consumed samples: 4096000 | consumed tokens: 8388608000 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.461561E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.367 | TFLOPs: 31.87 | 0: steps: 16000 loss: 2.5672 iter time (s): 0.422 samples/sec: 606.371 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 16000 | lm loss value: 2.327883E+00 | lm loss PPL: 1.025620E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 16000 to checkpoints_221m 0: [2022-11-28 14:52:20,920] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step16000 is begin to save! 0: [2022-11-28 14:52:20,924] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:52:21,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:52:21,023] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:52:21,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:52:21,045] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:52:21,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:52:21,067] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:52:21,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:52:21,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:52:21,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:52:21,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:52:21,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:52:21,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:52:21,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:52:21,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:52:21,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:52:21,184] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:52:21,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:52:21,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:52:21,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:52:21,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:52:21,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:52:21,252] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:52:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:52:21,275] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:52:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:52:21,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:52:21,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:52:21,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:52:21,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:52:21,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:52:21,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:52:21,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:52:21,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:52:21,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:52:21,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:52:21,410] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:52:21,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:52:21,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:52:21,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:52:21,438] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step16000/mp_rank_00_model_states.pt 0: [2022-11-28 14:52:21,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:52:21,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:52:21,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:52:21,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,505] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,505] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:52:21,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:52:21,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 4: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 7: [2022-11-28 14:52:21,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:52:21,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 14:52:21,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,519] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 1: [2022-11-28 14:52:21,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:52:21,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:52:21,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:52:21,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:52:21,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:52:21,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:52:21,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 3: [2022-11-28 14:52:21,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:52:21,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:52:21,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: [2022-11-28 14:52:21,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step16000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:52:21,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step16000 is ready now! 0: successfully saved checkpoint at iteration 16000 to checkpoints_221m 7: time (ms) | save-checkpoint: 651.80 7: iteration 16010/ 115203 | consumed samples: 4098560 | consumed tokens: 8393850880 | elapsed time per iteration (s): 0.51 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.451496E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 506.702 | TFLOPs: 26.59 | 7: iteration 16020/ 115203 | consumed samples: 4101120 | consumed tokens: 8399093760 | elapsed time per iteration (s): 0.42 | learning rate: 1.926E-04 | global batch size: 256 | lm loss: 2.456668E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.971 | TFLOPs: 31.95 | 7: iteration 16030/ 115203 | consumed samples: 4103680 | consumed tokens: 8404336640 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.446401E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.436 | TFLOPs: 32.24 | 7: iteration 16040/ 115203 | consumed samples: 4106240 | consumed tokens: 8409579520 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.487317E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.397 | TFLOPs: 31.97 | 7: iteration 16050/ 115203 | consumed samples: 4108800 | consumed tokens: 8414822400 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.422677E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.522 | TFLOPs: 32.24 | 7: iteration 16060/ 115203 | consumed samples: 4111360 | consumed tokens: 8420065280 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.492580E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.554 | TFLOPs: 31.98 | 7: iteration 16070/ 115203 | consumed samples: 4113920 | consumed tokens: 8425308160 | elapsed time per iteration (s): 0.43 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.457862E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.052 | TFLOPs: 31.22 | 7: iteration 16080/ 115203 | consumed samples: 4116480 | consumed tokens: 8430551040 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.451169E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.516 | TFLOPs: 31.77 | 7: iteration 16090/ 115203 | consumed samples: 4119040 | consumed tokens: 8435793920 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.456075E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.517 | TFLOPs: 31.67 | 7: iteration 16100/ 115203 | consumed samples: 4121600 | consumed tokens: 8441036800 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.435852E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.423 | TFLOPs: 31.71 | 7: iteration 16110/ 115203 | consumed samples: 4124160 | consumed tokens: 8446279680 | elapsed time per iteration (s): 0.42 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.468302E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.647 | TFLOPs: 31.88 | 7: iteration 16120/ 115203 | consumed samples: 4126720 | consumed tokens: 8451522560 | elapsed time per iteration (s): 0.43 | learning rate: 1.925E-04 | global batch size: 256 | lm loss: 2.478398E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.079 | TFLOPs: 31.59 | 7: iteration 16130/ 115203 | consumed samples: 4129280 | consumed tokens: 8456765440 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.446989E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.634 | TFLOPs: 31.62 | 7: iteration 16140/ 115203 | consumed samples: 4131840 | consumed tokens: 8462008320 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.430843E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.500 | TFLOPs: 31.82 | 7: iteration 16150/ 115203 | consumed samples: 4134400 | consumed tokens: 8467251200 | elapsed time per iteration (s): 0.43 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.462173E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.938 | TFLOPs: 30.90 | 7: iteration 16160/ 115203 | consumed samples: 4136960 | consumed tokens: 8472494080 | elapsed time per iteration (s): 0.43 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.455914E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.525 | TFLOPs: 31.51 | 7: iteration 16170/ 115203 | consumed samples: 4139520 | consumed tokens: 8477736960 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.443462E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.601 | TFLOPs: 31.88 | 7: iteration 16180/ 115203 | consumed samples: 4142080 | consumed tokens: 8482979840 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.472347E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.044 | TFLOPs: 32.22 | 7: iteration 16190/ 115203 | consumed samples: 4144640 | consumed tokens: 8488222720 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.434027E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.392 | TFLOPs: 32.18 | 7: iteration 16200/ 115203 | consumed samples: 4147200 | consumed tokens: 8493465600 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.450312E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.998 | TFLOPs: 32.01 | 7: iteration 16210/ 115203 | consumed samples: 4149760 | consumed tokens: 8498708480 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.448665E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.329 | TFLOPs: 32.08 | 7: iteration 16220/ 115203 | consumed samples: 4152320 | consumed tokens: 8503951360 | elapsed time per iteration (s): 0.42 | learning rate: 1.924E-04 | global batch size: 256 | lm loss: 2.435467E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.492 | TFLOPs: 31.61 | 7: iteration 16230/ 115203 | consumed samples: 4154880 | consumed tokens: 8509194240 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.448333E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.804 | TFLOPs: 32.00 | 7: iteration 16240/ 115203 | consumed samples: 4157440 | consumed tokens: 8514437120 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.438341E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.797 | TFLOPs: 31.84 | 7: iteration 16250/ 115203 | consumed samples: 4160000 | consumed tokens: 8519680000 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.466242E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.980 | TFLOPs: 31.64 | 7: iteration 16260/ 115203 | consumed samples: 4162560 | consumed tokens: 8524922880 | elapsed time per iteration (s): 0.43 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.443731E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.991 | TFLOPs: 31.43 | 7: iteration 16270/ 115203 | consumed samples: 4165120 | consumed tokens: 8530165760 | elapsed time per iteration (s): 0.43 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.432397E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.745 | TFLOPs: 31.47 | 7: iteration 16280/ 115203 | consumed samples: 4167680 | consumed tokens: 8535408640 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.448675E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.727 | TFLOPs: 32.15 | 7: iteration 16290/ 115203 | consumed samples: 4170240 | consumed tokens: 8540651520 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.460918E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.850 | TFLOPs: 31.68 | 7: iteration 16300/ 115203 | consumed samples: 4172800 | consumed tokens: 8545894400 | elapsed time per iteration (s): 0.44 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.445449E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.972 | TFLOPs: 30.69 | 7: iteration 16310/ 115203 | consumed samples: 4175360 | consumed tokens: 8551137280 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.470164E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.284 | TFLOPs: 31.71 | 7: iteration 16320/ 115203 | consumed samples: 4177920 | consumed tokens: 8556380160 | elapsed time per iteration (s): 0.42 | learning rate: 1.923E-04 | global batch size: 256 | lm loss: 2.473137E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.154 | TFLOPs: 31.80 | 7: iteration 16330/ 115203 | consumed samples: 4180480 | consumed tokens: 8561623040 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.440590E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.407 | TFLOPs: 31.34 | 7: iteration 16340/ 115203 | consumed samples: 4183040 | consumed tokens: 8566865920 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.462817E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.685 | TFLOPs: 31.52 | 7: iteration 16350/ 115203 | consumed samples: 4185600 | consumed tokens: 8572108800 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.446123E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.752 | TFLOPs: 32.20 | 7: iteration 16360/ 115203 | consumed samples: 4188160 | consumed tokens: 8577351680 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.465509E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.430 | TFLOPs: 31.92 | 7: iteration 16370/ 115203 | consumed samples: 4190720 | consumed tokens: 8582594560 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.471964E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.081 | TFLOPs: 31.38 | 7: iteration 16380/ 115203 | consumed samples: 4193280 | consumed tokens: 8587837440 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.474566E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.971 | TFLOPs: 31.48 | 7: iteration 16390/ 115203 | consumed samples: 4195840 | consumed tokens: 8593080320 | elapsed time per iteration (s): 0.43 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.460714E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.297 | TFLOPs: 31.29 | 7: iteration 16400/ 115203 | consumed samples: 4198400 | consumed tokens: 8598323200 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.466978E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.934 | TFLOPs: 31.79 | 7: iteration 16410/ 115203 | consumed samples: 4200960 | consumed tokens: 8603566080 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.446935E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.286 | TFLOPs: 31.97 | 7: iteration 16420/ 115203 | consumed samples: 4203520 | consumed tokens: 8608808960 | elapsed time per iteration (s): 0.42 | learning rate: 1.922E-04 | global batch size: 256 | lm loss: 2.444626E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.891 | TFLOPs: 32.16 | 7: iteration 16430/ 115203 | consumed samples: 4206080 | consumed tokens: 8614051840 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.458239E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.611 | TFLOPs: 30.99 | 7: iteration 16440/ 115203 | consumed samples: 4208640 | consumed tokens: 8619294720 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.450328E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.899 | TFLOPs: 31.79 | 7: iteration 16450/ 115203 | consumed samples: 4211200 | consumed tokens: 8624537600 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.438051E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.004 | TFLOPs: 31.69 | 7: iteration 16460/ 115203 | consumed samples: 4213760 | consumed tokens: 8629780480 | elapsed time per iteration (s): 0.43 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.441780E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.653 | TFLOPs: 31.57 | 7: iteration 16470/ 115203 | consumed samples: 4216320 | consumed tokens: 8635023360 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.457216E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.092 | TFLOPs: 32.12 | 7: iteration 16480/ 115203 | consumed samples: 4218880 | consumed tokens: 8640266240 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.478653E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.068 | TFLOPs: 32.01 | 7: iteration 16490/ 115203 | consumed samples: 4221440 | consumed tokens: 8645509120 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.440953E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.042 | TFLOPs: 32.27 | 7: iteration 16500/ 115203 | consumed samples: 4224000 | consumed tokens: 8650752000 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.466209E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.925 | TFLOPs: 32.11 | 7: iteration 16510/ 115203 | consumed samples: 4226560 | consumed tokens: 8655994880 | elapsed time per iteration (s): 0.42 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.446106E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.345 | TFLOPs: 32.02 | 7: iteration 16520/ 115203 | consumed samples: 4229120 | consumed tokens: 8661237760 | elapsed time per iteration (s): 0.44 | learning rate: 1.921E-04 | global batch size: 256 | lm loss: 2.462685E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.664 | TFLOPs: 30.68 | 7: iteration 16530/ 115203 | consumed samples: 4231680 | consumed tokens: 8666480640 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.456813E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.321 | TFLOPs: 31.92 | 7: iteration 16540/ 115203 | consumed samples: 4234240 | consumed tokens: 8671723520 | elapsed time per iteration (s): 0.43 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.445288E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.885 | TFLOPs: 30.90 | 7: iteration 16550/ 115203 | consumed samples: 4236800 | consumed tokens: 8676966400 | elapsed time per iteration (s): 0.43 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.466736E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.107 | TFLOPs: 31.22 | 7: iteration 16560/ 115203 | consumed samples: 4239360 | consumed tokens: 8682209280 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.425408E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.681 | TFLOPs: 31.83 | 7: iteration 16570/ 115203 | consumed samples: 4241920 | consumed tokens: 8687452160 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.430713E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.634 | TFLOPs: 32.09 | 7: iteration 16580/ 115203 | consumed samples: 4244480 | consumed tokens: 8692695040 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.469789E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.720 | TFLOPs: 32.04 | 7: iteration 16590/ 115203 | consumed samples: 4247040 | consumed tokens: 8697937920 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.469716E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.142 | TFLOPs: 31.75 | 7: iteration 16600/ 115203 | consumed samples: 4249600 | consumed tokens: 8703180800 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.414641E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.597 | TFLOPs: 32.04 | 7: iteration 16610/ 115203 | consumed samples: 4252160 | consumed tokens: 8708423680 | elapsed time per iteration (s): 0.43 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.476116E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.301 | TFLOPs: 31.44 | 7: iteration 16620/ 115203 | consumed samples: 4254720 | consumed tokens: 8713666560 | elapsed time per iteration (s): 0.42 | learning rate: 1.920E-04 | global batch size: 256 | lm loss: 2.455091E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.546 | TFLOPs: 32.19 | 7: iteration 16630/ 115203 | consumed samples: 4257280 | consumed tokens: 8718909440 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.458472E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.029 | TFLOPs: 31.74 | 7: iteration 16640/ 115203 | consumed samples: 4259840 | consumed tokens: 8724152320 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.447664E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.931 | TFLOPs: 31.42 | 7: iteration 16650/ 115203 | consumed samples: 4262400 | consumed tokens: 8729395200 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.438673E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.416 | TFLOPs: 32.18 | 7: iteration 16660/ 115203 | consumed samples: 4264960 | consumed tokens: 8734638080 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.448556E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.629 | TFLOPs: 31.20 | 7: iteration 16670/ 115203 | consumed samples: 4267520 | consumed tokens: 8739880960 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.432042E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.816 | TFLOPs: 31.89 | 7: iteration 16680/ 115203 | consumed samples: 4270080 | consumed tokens: 8745123840 | elapsed time per iteration (s): 0.43 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.458416E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.733 | TFLOPs: 31.20 | 7: iteration 16690/ 115203 | consumed samples: 4272640 | consumed tokens: 8750366720 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.426112E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.341 | TFLOPs: 31.87 | 7: iteration 16700/ 115203 | consumed samples: 4275200 | consumed tokens: 8755609600 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.461617E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.672 | TFLOPs: 32.15 | 7: iteration 16710/ 115203 | consumed samples: 4277760 | consumed tokens: 8760852480 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.455849E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.115 | TFLOPs: 32.22 | 7: iteration 16720/ 115203 | consumed samples: 4280320 | consumed tokens: 8766095360 | elapsed time per iteration (s): 0.42 | learning rate: 1.919E-04 | global batch size: 256 | lm loss: 2.400357E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.773 | TFLOPs: 31.68 | 7: iteration 16730/ 115203 | consumed samples: 4282880 | consumed tokens: 8771338240 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.483531E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.688 | TFLOPs: 32.25 | 7: iteration 16740/ 115203 | consumed samples: 4285440 | consumed tokens: 8776581120 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.452343E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.645 | TFLOPs: 32.04 | 7: iteration 16750/ 115203 | consumed samples: 4288000 | consumed tokens: 8781824000 | elapsed time per iteration (s): 0.44 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.458778E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.104 | TFLOPs: 30.65 | 7: iteration 16760/ 115203 | consumed samples: 4290560 | consumed tokens: 8787066880 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.454506E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.412 | TFLOPs: 32.13 | 7: iteration 16770/ 115203 | consumed samples: 4293120 | consumed tokens: 8792309760 | elapsed time per iteration (s): 0.43 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.427415E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.388 | TFLOPs: 31.40 | 7: iteration 16780/ 115203 | consumed samples: 4295680 | consumed tokens: 8797552640 | elapsed time per iteration (s): 0.42 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.454279E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.584 | TFLOPs: 31.62 | 7: iteration 16790/ 115203 | consumed samples: 4298240 | consumed tokens: 8802795520 | elapsed time per iteration (s): 0.43 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.423783E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.865 | TFLOPs: 31.47 | 7: iteration 16800/ 115203 | consumed samples: 4300800 | consumed tokens: 8808038400 | elapsed time per iteration (s): 0.43 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.452157E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.444 | TFLOPs: 31.35 | 7: iteration 16810/ 115203 | consumed samples: 4303360 | consumed tokens: 8813281280 | elapsed time per iteration (s): 0.45 | learning rate: 1.918E-04 | global batch size: 256 | lm loss: 2.423626E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.680 | TFLOPs: 30.00 | 7: iteration 16820/ 115203 | consumed samples: 4305920 | consumed tokens: 8818524160 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.423117E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.577 | TFLOPs: 31.93 | 7: iteration 16830/ 115203 | consumed samples: 4308480 | consumed tokens: 8823767040 | elapsed time per iteration (s): 0.43 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.444438E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.980 | TFLOPs: 31.58 | 7: iteration 16840/ 115203 | consumed samples: 4311040 | consumed tokens: 8829009920 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.424239E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.095 | TFLOPs: 32.06 | 7: iteration 16850/ 115203 | consumed samples: 4313600 | consumed tokens: 8834252800 | elapsed time per iteration (s): 0.43 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.443854E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.779 | TFLOPs: 31.15 | 7: iteration 16860/ 115203 | consumed samples: 4316160 | consumed tokens: 8839495680 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.457418E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.758 | TFLOPs: 31.99 | 7: iteration 16870/ 115203 | consumed samples: 4318720 | consumed tokens: 8844738560 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.462042E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.850 | TFLOPs: 31.89 | 7: iteration 16880/ 115203 | consumed samples: 4321280 | consumed tokens: 8849981440 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.424302E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.083 | TFLOPs: 31.64 | 7: iteration 16890/ 115203 | consumed samples: 4323840 | consumed tokens: 8855224320 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.459616E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.502 | TFLOPs: 32.19 | 7: iteration 16900/ 115203 | consumed samples: 4326400 | consumed tokens: 8860467200 | elapsed time per iteration (s): 0.44 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.432881E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.436 | TFLOPs: 30.87 | 7: iteration 16910/ 115203 | consumed samples: 4328960 | consumed tokens: 8865710080 | elapsed time per iteration (s): 0.42 | learning rate: 1.917E-04 | global batch size: 256 | lm loss: 2.424856E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.292 | TFLOPs: 32.13 | 7: iteration 16920/ 115203 | consumed samples: 4331520 | consumed tokens: 8870952960 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.423950E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.600 | TFLOPs: 32.30 | 7: iteration 16930/ 115203 | consumed samples: 4334080 | consumed tokens: 8876195840 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.428350E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.348 | TFLOPs: 31.81 | 7: iteration 16940/ 115203 | consumed samples: 4336640 | consumed tokens: 8881438720 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.444935E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.040 | TFLOPs: 31.80 | 7: iteration 16950/ 115203 | consumed samples: 4339200 | consumed tokens: 8886681600 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.422165E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.198 | TFLOPs: 32.12 | 7: iteration 16960/ 115203 | consumed samples: 4341760 | consumed tokens: 8891924480 | elapsed time per iteration (s): 0.44 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.441733E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.053 | TFLOPs: 30.80 | 7: iteration 16970/ 115203 | consumed samples: 4344320 | consumed tokens: 8897167360 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.439095E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.599 | TFLOPs: 31.88 | 7: iteration 16980/ 115203 | consumed samples: 4346880 | consumed tokens: 8902410240 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.473450E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.899 | TFLOPs: 32.26 | 7: iteration 16990/ 115203 | consumed samples: 4349440 | consumed tokens: 8907653120 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.460853E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.777 | TFLOPs: 32.20 | 7: iteration 17000/ 115203 | consumed samples: 4352000 | consumed tokens: 8912896000 | elapsed time per iteration (s): 0.42 | learning rate: 1.916E-04 | global batch size: 256 | lm loss: 2.458126E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.532 | TFLOPs: 32.24 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 17000 | lm loss value: 2.430656E+00 | lm loss PPL: 1.136634E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 17000 to checkpoints_221m 0: [2022-11-28 14:59:25,103] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step17000 is begin to save! 0: [2022-11-28 14:59:25,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_01-model_00-model_states.pt... 0: [2022-11-28 14:59:25,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_01-model_00-model_states.pt. 0: [2022-11-28 14:59:25,206] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_03-model_00-model_states.pt... 0: [2022-11-28 14:59:25,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_03-model_00-model_states.pt. 0: [2022-11-28 14:59:25,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_04-model_00-model_states.pt... 0: [2022-11-28 14:59:25,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_04-model_00-model_states.pt. 0: [2022-11-28 14:59:25,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_05-model_00-model_states.pt... 0: [2022-11-28 14:59:25,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_05-model_00-model_states.pt. 0: [2022-11-28 14:59:25,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_06-model_00-model_states.pt... 0: [2022-11-28 14:59:25,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_06-model_00-model_states.pt. 0: [2022-11-28 14:59:25,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_07-model_00-model_states.pt... 0: [2022-11-28 14:59:25,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_07-model_00-model_states.pt. 0: [2022-11-28 14:59:25,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_08-model_00-model_states.pt... 0: [2022-11-28 14:59:25,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_08-model_00-model_states.pt. 0: [2022-11-28 14:59:25,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_09-model_00-model_states.pt... 0: [2022-11-28 14:59:25,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_09-model_00-model_states.pt. 0: [2022-11-28 14:59:25,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_10-model_00-model_states.pt... 0: [2022-11-28 14:59:25,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_10-model_00-model_states.pt. 0: [2022-11-28 14:59:25,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_11-model_00-model_states.pt... 0: [2022-11-28 14:59:25,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_11-model_00-model_states.pt. 0: [2022-11-28 14:59:25,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_12-model_00-model_states.pt... 0: [2022-11-28 14:59:25,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_12-model_00-model_states.pt. 0: [2022-11-28 14:59:25,436] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_13-model_00-model_states.pt... 0: [2022-11-28 14:59:25,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_13-model_00-model_states.pt. 0: [2022-11-28 14:59:25,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_14-model_00-model_states.pt... 0: [2022-11-28 14:59:25,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_14-model_00-model_states.pt. 0: [2022-11-28 14:59:25,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_15-model_00-model_states.pt... 0: [2022-11-28 14:59:25,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_15-model_00-model_states.pt. 0: [2022-11-28 14:59:25,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_16-model_00-model_states.pt... 0: [2022-11-28 14:59:25,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_16-model_00-model_states.pt. 0: [2022-11-28 14:59:25,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_17-model_00-model_states.pt... 0: [2022-11-28 14:59:25,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_17-model_00-model_states.pt. 0: [2022-11-28 14:59:25,551] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_18-model_00-model_states.pt... 0: [2022-11-28 14:59:25,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_18-model_00-model_states.pt. 0: [2022-11-28 14:59:25,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_19-model_00-model_states.pt... 0: [2022-11-28 14:59:25,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_19-model_00-model_states.pt. 0: [2022-11-28 14:59:25,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_20-model_00-model_states.pt... 0: [2022-11-28 14:59:25,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_20-model_00-model_states.pt. 0: [2022-11-28 14:59:25,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/layer_22-model_00-model_states.pt... 0: [2022-11-28 14:59:25,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/layer_22-model_00-model_states.pt. 0: [2022-11-28 14:59:25,624] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step17000/mp_rank_00_model_states.pt 0: [2022-11-28 14:59:25,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/mp_rank_00_model_states.pt... 0: [2022-11-28 14:59:25,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/mp_rank_00_model_states.pt. 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 14:59:25,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 14:59:25,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,698] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,698] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 14:59:25,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 14:59:25,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 4: [2022-11-28 14:59:25,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 2: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 14:59:25,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 5: [2022-11-28 14:59:25,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 14:59:25,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 1: [2022-11-28 14:59:25,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 14:59:25,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 14:59:25,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 7: [2022-11-28 14:59:25,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 14:59:25,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 14:59:25,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 6: [2022-11-28 14:59:25,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 14:59:25,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 14:59:25,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 3: [2022-11-28 14:59:25,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 14:59:25,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 14:59:25,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 14:59:25,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: [2022-11-28 14:59:25,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step17000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 14:59:25,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step17000 is ready now! 0: successfully saved checkpoint at iteration 17000 to checkpoints_221m 7: time (ms) | save-checkpoint: 659.61 7: iteration 17010/ 115203 | consumed samples: 4354560 | consumed tokens: 8918138880 | elapsed time per iteration (s): 0.50 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.453003E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 508.444 | TFLOPs: 26.68 | 7: iteration 17020/ 115203 | consumed samples: 4357120 | consumed tokens: 8923381760 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.467666E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.674 | TFLOPs: 32.25 | 7: iteration 17030/ 115203 | consumed samples: 4359680 | consumed tokens: 8928624640 | elapsed time per iteration (s): 0.43 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.461337E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.091 | TFLOPs: 31.54 | 7: iteration 17040/ 115203 | consumed samples: 4362240 | consumed tokens: 8933867520 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.460539E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.720 | TFLOPs: 31.99 | 7: iteration 17050/ 115203 | consumed samples: 4364800 | consumed tokens: 8939110400 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.466113E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.654 | TFLOPs: 31.73 | 7: iteration 17060/ 115203 | consumed samples: 4367360 | consumed tokens: 8944353280 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.393909E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.693 | TFLOPs: 31.88 | 7: iteration 17070/ 115203 | consumed samples: 4369920 | consumed tokens: 8949596160 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.448437E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.294 | TFLOPs: 32.07 | 7: iteration 17080/ 115203 | consumed samples: 4372480 | consumed tokens: 8954839040 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.451209E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.949 | TFLOPs: 32.06 | 7: iteration 17090/ 115203 | consumed samples: 4375040 | consumed tokens: 8960081920 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.448344E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.978 | TFLOPs: 32.11 | 7: iteration 17100/ 115203 | consumed samples: 4377600 | consumed tokens: 8965324800 | elapsed time per iteration (s): 0.42 | learning rate: 1.915E-04 | global batch size: 256 | lm loss: 2.442467E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.993 | TFLOPs: 31.69 | 7: iteration 17110/ 115203 | consumed samples: 4380160 | consumed tokens: 8970567680 | elapsed time per iteration (s): 0.43 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.431926E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.158 | TFLOPs: 31.44 | 7: iteration 17120/ 115203 | consumed samples: 4382720 | consumed tokens: 8975810560 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.478867E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.890 | TFLOPs: 31.90 | 7: iteration 17130/ 115203 | consumed samples: 4385280 | consumed tokens: 8981053440 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.424017E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.077 | TFLOPs: 31.85 | 7: iteration 17140/ 115203 | consumed samples: 4387840 | consumed tokens: 8986296320 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.440553E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.256 | TFLOPs: 32.28 | 7: iteration 17150/ 115203 | consumed samples: 4390400 | consumed tokens: 8991539200 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.444926E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.696 | TFLOPs: 31.73 | 7: iteration 17160/ 115203 | consumed samples: 4392960 | consumed tokens: 8996782080 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.437640E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.685 | TFLOPs: 31.88 | 7: iteration 17170/ 115203 | consumed samples: 4395520 | consumed tokens: 9002024960 | elapsed time per iteration (s): 0.42 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.463239E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.811 | TFLOPs: 32.05 | 7: iteration 17180/ 115203 | consumed samples: 4398080 | consumed tokens: 9007267840 | elapsed time per iteration (s): 0.43 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.434692E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.898 | TFLOPs: 31.58 | 7: iteration 17190/ 115203 | consumed samples: 4400640 | consumed tokens: 9012510720 | elapsed time per iteration (s): 0.43 | learning rate: 1.914E-04 | global batch size: 256 | lm loss: 2.462315E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.281 | TFLOPs: 31.55 | 7: iteration 17200/ 115203 | consumed samples: 4403200 | consumed tokens: 9017753600 | elapsed time per iteration (s): 0.44 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.402007E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.451 | TFLOPs: 30.88 | 7: iteration 17210/ 115203 | consumed samples: 4405760 | consumed tokens: 9022996480 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.422919E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.167 | TFLOPs: 31.80 | 7: iteration 17220/ 115203 | consumed samples: 4408320 | consumed tokens: 9028239360 | elapsed time per iteration (s): 0.43 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.420667E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.679 | TFLOPs: 31.46 | 7: iteration 17230/ 115203 | consumed samples: 4410880 | consumed tokens: 9033482240 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.466463E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.043 | TFLOPs: 31.80 | 7: iteration 17240/ 115203 | consumed samples: 4413440 | consumed tokens: 9038725120 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.465367E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.339 | TFLOPs: 31.66 | 7: iteration 17250/ 115203 | consumed samples: 4416000 | consumed tokens: 9043968000 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.432234E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.769 | TFLOPs: 31.89 | 7: iteration 17260/ 115203 | consumed samples: 4418560 | consumed tokens: 9049210880 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.440896E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.441 | TFLOPs: 31.87 | 7: iteration 17270/ 115203 | consumed samples: 4421120 | consumed tokens: 9054453760 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.463119E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.165 | TFLOPs: 31.91 | 7: iteration 17280/ 115203 | consumed samples: 4423680 | consumed tokens: 9059696640 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.445145E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.826 | TFLOPs: 31.73 | 7: iteration 17290/ 115203 | consumed samples: 4426240 | consumed tokens: 9064939520 | elapsed time per iteration (s): 0.42 | learning rate: 1.913E-04 | global batch size: 256 | lm loss: 2.449459E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.785 | TFLOPs: 32.20 | 7: iteration 17300/ 115203 | consumed samples: 4428800 | consumed tokens: 9070182400 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.453640E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.480 | TFLOPs: 32.29 | 7: iteration 17310/ 115203 | consumed samples: 4431360 | consumed tokens: 9075425280 | elapsed time per iteration (s): 0.43 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.421680E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.115 | TFLOPs: 31.12 | 7: iteration 17320/ 115203 | consumed samples: 4433920 | consumed tokens: 9080668160 | elapsed time per iteration (s): 0.43 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.432217E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.841 | TFLOPs: 31.47 | 7: iteration 17330/ 115203 | consumed samples: 4436480 | consumed tokens: 9085911040 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.455463E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.679 | TFLOPs: 32.15 | 7: iteration 17340/ 115203 | consumed samples: 4439040 | consumed tokens: 9091153920 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.431938E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.996 | TFLOPs: 32.01 | 7: iteration 17350/ 115203 | consumed samples: 4441600 | consumed tokens: 9096396800 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.421594E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.640 | TFLOPs: 32.09 | 7: iteration 17360/ 115203 | consumed samples: 4444160 | consumed tokens: 9101639680 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.412990E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.461 | TFLOPs: 31.98 | 7: iteration 17370/ 115203 | consumed samples: 4446720 | consumed tokens: 9106882560 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.419681E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.282 | TFLOPs: 32.23 | 7: iteration 17380/ 115203 | consumed samples: 4449280 | consumed tokens: 9112125440 | elapsed time per iteration (s): 0.42 | learning rate: 1.912E-04 | global batch size: 256 | lm loss: 2.450278E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.987 | TFLOPs: 31.80 | 7: iteration 17390/ 115203 | consumed samples: 4451840 | consumed tokens: 9117368320 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.472482E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.414 | TFLOPs: 32.24 | 7: iteration 17400/ 115203 | consumed samples: 4454400 | consumed tokens: 9122611200 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.447231E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.259 | TFLOPs: 31.81 | 7: iteration 17410/ 115203 | consumed samples: 4456960 | consumed tokens: 9127854080 | elapsed time per iteration (s): 0.43 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.436234E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.266 | TFLOPs: 31.02 | 7: iteration 17420/ 115203 | consumed samples: 4459520 | consumed tokens: 9133096960 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.439295E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.755 | TFLOPs: 32.05 | 7: iteration 17430/ 115203 | consumed samples: 4462080 | consumed tokens: 9138339840 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.458963E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.220 | TFLOPs: 32.12 | 7: iteration 17440/ 115203 | consumed samples: 4464640 | consumed tokens: 9143582720 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.424472E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.618 | TFLOPs: 32.04 | 7: iteration 17450/ 115203 | consumed samples: 4467200 | consumed tokens: 9148825600 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.453614E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.012 | TFLOPs: 31.80 | 7: iteration 17460/ 115203 | consumed samples: 4469760 | consumed tokens: 9154068480 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.460766E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.927 | TFLOPs: 32.21 | 7: iteration 17470/ 115203 | consumed samples: 4472320 | consumed tokens: 9159311360 | elapsed time per iteration (s): 0.42 | learning rate: 1.911E-04 | global batch size: 256 | lm loss: 2.409707E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.045 | TFLOPs: 32.22 | 7: iteration 17480/ 115203 | consumed samples: 4474880 | consumed tokens: 9164554240 | elapsed time per iteration (s): 0.43 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.445619E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.105 | TFLOPs: 31.49 | 7: iteration 17490/ 115203 | consumed samples: 4477440 | consumed tokens: 9169797120 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.466207E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.991 | TFLOPs: 32.27 | 7: iteration 17500/ 115203 | consumed samples: 4480000 | consumed tokens: 9175040000 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.419127E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.787 | TFLOPs: 31.94 | 7: iteration 17510/ 115203 | consumed samples: 4482560 | consumed tokens: 9180282880 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.419391E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.290 | TFLOPs: 31.71 | 7: iteration 17520/ 115203 | consumed samples: 4485120 | consumed tokens: 9185525760 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.444013E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.069 | TFLOPs: 32.01 | 7: iteration 17530/ 115203 | consumed samples: 4487680 | consumed tokens: 9190768640 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.474102E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.459 | TFLOPs: 31.82 | 7: iteration 17540/ 115203 | consumed samples: 4490240 | consumed tokens: 9196011520 | elapsed time per iteration (s): 0.43 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.477287E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.087 | TFLOPs: 31.54 | 7: iteration 17550/ 115203 | consumed samples: 4492800 | consumed tokens: 9201254400 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.449818E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.845 | TFLOPs: 31.84 | 7: iteration 17560/ 115203 | consumed samples: 4495360 | consumed tokens: 9206497280 | elapsed time per iteration (s): 0.43 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.426430E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.339 | TFLOPs: 31.39 | 7: iteration 17570/ 115203 | consumed samples: 4497920 | consumed tokens: 9211740160 | elapsed time per iteration (s): 0.42 | learning rate: 1.910E-04 | global batch size: 256 | lm loss: 2.428071E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.679 | TFLOPs: 32.25 | 7: iteration 17580/ 115203 | consumed samples: 4500480 | consumed tokens: 9216983040 | elapsed time per iteration (s): 0.43 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.456143E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.305 | TFLOPs: 31.60 | 7: iteration 17590/ 115203 | consumed samples: 4503040 | consumed tokens: 9222225920 | elapsed time per iteration (s): 0.53 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.422298E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 480.161 | TFLOPs: 25.19 | 7: iteration 17600/ 115203 | consumed samples: 4505600 | consumed tokens: 9227468800 | elapsed time per iteration (s): 0.44 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.459664E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.399 | TFLOPs: 30.71 | 7: iteration 17610/ 115203 | consumed samples: 4508160 | consumed tokens: 9232711680 | elapsed time per iteration (s): 0.43 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.445472E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.117 | TFLOPs: 31.59 | 7: iteration 17620/ 115203 | consumed samples: 4510720 | consumed tokens: 9237954560 | elapsed time per iteration (s): 0.43 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.455949E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.098 | TFLOPs: 30.96 | 7: iteration 17630/ 115203 | consumed samples: 4513280 | consumed tokens: 9243197440 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.426662E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.046 | TFLOPs: 31.96 | 7: iteration 17640/ 115203 | consumed samples: 4515840 | consumed tokens: 9248440320 | elapsed time per iteration (s): 0.42 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.422856E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.461 | TFLOPs: 31.61 | 7: iteration 17650/ 115203 | consumed samples: 4518400 | consumed tokens: 9253683200 | elapsed time per iteration (s): 0.43 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.450269E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.646 | TFLOPs: 31.41 | 7: iteration 17660/ 115203 | consumed samples: 4520960 | consumed tokens: 9258926080 | elapsed time per iteration (s): 0.44 | learning rate: 1.909E-04 | global batch size: 256 | lm loss: 2.454313E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.110 | TFLOPs: 30.75 | 7: iteration 17670/ 115203 | consumed samples: 4523520 | consumed tokens: 9264168960 | elapsed time per iteration (s): 0.45 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.401667E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.563 | TFLOPs: 29.88 | 7: iteration 17680/ 115203 | consumed samples: 4526080 | consumed tokens: 9269411840 | elapsed time per iteration (s): 0.43 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.437203E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.946 | TFLOPs: 30.95 | 7: iteration 17690/ 115203 | consumed samples: 4528640 | consumed tokens: 9274654720 | elapsed time per iteration (s): 0.56 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.471120E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 459.340 | TFLOPs: 24.10 | 7: iteration 17700/ 115203 | consumed samples: 4531200 | consumed tokens: 9279897600 | elapsed time per iteration (s): 0.45 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.445350E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.205 | TFLOPs: 29.87 | 7: iteration 17710/ 115203 | consumed samples: 4533760 | consumed tokens: 9285140480 | elapsed time per iteration (s): 0.44 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.438546E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.876 | TFLOPs: 30.74 | 7: iteration 17720/ 115203 | consumed samples: 4536320 | consumed tokens: 9290383360 | elapsed time per iteration (s): 0.43 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.441864E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.183 | TFLOPs: 31.02 | 7: iteration 17730/ 115203 | consumed samples: 4538880 | consumed tokens: 9295626240 | elapsed time per iteration (s): 0.44 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.399047E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.757 | TFLOPs: 30.52 | 7: iteration 17740/ 115203 | consumed samples: 4541440 | consumed tokens: 9300869120 | elapsed time per iteration (s): 0.43 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.440064E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | 7: iteration 17750/ 115203 | consumed samples: 4544000 | consumed tokens: 9306112000 | elapsed time per iteration (s): 0.43 | learning rate: 1.908E-04 | global batch size: 256 | lm loss: 2.464521E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.162 | TFLOPs: 31.59 | 7: iteration 17760/ 115203 | consumed samples: 4546560 | consumed tokens: 9311354880 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.463952E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.664 | TFLOPs: 30.41 | 7: iteration 17770/ 115203 | consumed samples: 4549120 | consumed tokens: 9316597760 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.457503E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.026 | TFLOPs: 30.70 | 7: iteration 17780/ 115203 | consumed samples: 4551680 | consumed tokens: 9321840640 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.454526E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.412 | TFLOPs: 31.40 | 7: iteration 17790/ 115203 | consumed samples: 4554240 | consumed tokens: 9327083520 | elapsed time per iteration (s): 0.45 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.416852E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.264 | TFLOPs: 29.71 | 7: iteration 17800/ 115203 | consumed samples: 4556800 | consumed tokens: 9332326400 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.451752E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.320 | TFLOPs: 30.45 | 7: iteration 17810/ 115203 | consumed samples: 4559360 | consumed tokens: 9337569280 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.448581E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.095 | TFLOPs: 31.12 | 7: iteration 17820/ 115203 | consumed samples: 4561920 | consumed tokens: 9342812160 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.438276E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.941 | TFLOPs: 31.43 | 7: iteration 17830/ 115203 | consumed samples: 4564480 | consumed tokens: 9348055040 | elapsed time per iteration (s): 0.43 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.453886E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.198 | TFLOPs: 31.33 | 7: iteration 17840/ 115203 | consumed samples: 4567040 | consumed tokens: 9353297920 | elapsed time per iteration (s): 0.44 | learning rate: 1.907E-04 | global batch size: 256 | lm loss: 2.434385E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.331 | TFLOPs: 30.66 | 7: iteration 17850/ 115203 | consumed samples: 4569600 | consumed tokens: 9358540800 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.435544E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.207 | TFLOPs: 30.91 | 7: iteration 17860/ 115203 | consumed samples: 4572160 | consumed tokens: 9363783680 | elapsed time per iteration (s): 0.45 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.423136E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.650 | TFLOPs: 29.99 | 7: iteration 17870/ 115203 | consumed samples: 4574720 | consumed tokens: 9369026560 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.450267E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.223 | TFLOPs: 30.97 | 7: iteration 17880/ 115203 | consumed samples: 4577280 | consumed tokens: 9374269440 | elapsed time per iteration (s): 0.45 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.448251E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.605 | TFLOPs: 29.94 | 7: iteration 17890/ 115203 | consumed samples: 4579840 | consumed tokens: 9379512320 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.419066E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.464 | TFLOPs: 31.30 | 7: iteration 17900/ 115203 | consumed samples: 4582400 | consumed tokens: 9384755200 | elapsed time per iteration (s): 0.46 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.480624E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.313 | TFLOPs: 29.19 | 7: iteration 17910/ 115203 | consumed samples: 4584960 | consumed tokens: 9389998080 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.388858E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.947 | TFLOPs: 31.16 | 7: iteration 17920/ 115203 | consumed samples: 4587520 | consumed tokens: 9395240960 | elapsed time per iteration (s): 0.43 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.434051E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.855 | TFLOPs: 31.00 | 7: iteration 17930/ 115203 | consumed samples: 4590080 | consumed tokens: 9400483840 | elapsed time per iteration (s): 0.44 | learning rate: 1.906E-04 | global batch size: 256 | lm loss: 2.455253E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.394 | TFLOPs: 30.87 | 7: iteration 17940/ 115203 | consumed samples: 4592640 | consumed tokens: 9405726720 | elapsed time per iteration (s): 0.43 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.440704E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.215 | TFLOPs: 31.33 | 7: iteration 17950/ 115203 | consumed samples: 4595200 | consumed tokens: 9410969600 | elapsed time per iteration (s): 0.43 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.448891E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.247 | TFLOPs: 31.44 | 7: iteration 17960/ 115203 | consumed samples: 4597760 | consumed tokens: 9416212480 | elapsed time per iteration (s): 0.46 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.424712E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.778 | TFLOPs: 29.48 | 7: iteration 17970/ 115203 | consumed samples: 4600320 | consumed tokens: 9421455360 | elapsed time per iteration (s): 0.44 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.431174E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.594 | TFLOPs: 30.31 | 7: iteration 17980/ 115203 | consumed samples: 4602880 | consumed tokens: 9426698240 | elapsed time per iteration (s): 0.43 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.454478E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.113 | TFLOPs: 31.01 | 7: iteration 17990/ 115203 | consumed samples: 4605440 | consumed tokens: 9431941120 | elapsed time per iteration (s): 0.43 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.438674E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.120 | TFLOPs: 30.91 | 0: [2022-11-28 15:06:35,634] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=0, lr=[0.00019048094388569267, 0.00019048094388569267, 0.00019048094388569267], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 18000/ 115203 | consumed samples: 4608000 | consumed tokens: 9437184000 | elapsed time per iteration (s): 0.43 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.443988E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.095 | TFLOPs: 31.54 | 0: steps: 18000 loss: 2.4071 iter time (s): 0.424 samples/sec: 603.904 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 18000 | lm loss value: 2.277142E+00 | lm loss PPL: 9.748779E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 18000 to checkpoints_221m 0: [2022-11-28 15:06:35,798] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step18000 is begin to save! 0: [2022-11-28 15:06:35,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:06:35,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:06:35,931] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:06:35,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:06:35,954] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:06:35,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:06:35,980] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:06:36,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:06:36,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:06:36,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:06:36,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:06:36,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:06:36,054] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:06:36,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:06:36,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:06:36,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:06:36,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:06:36,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:06:36,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:06:36,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:06:36,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:06:36,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:06:36,180] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:06:36,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:06:36,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:06:36,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:06:36,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:06:36,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:06:36,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:06:36,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:06:36,280] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:06:36,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:06:36,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:06:36,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:06:36,329] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:06:36,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:06:36,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:06:36,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:06:36,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:06:36,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:06:36,385] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step18000/mp_rank_00_model_states.pt 0: [2022-11-28 15:06:36,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:06:36,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:06:36,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:06:36,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,455] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,455] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,456] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,456] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,458] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,458] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,460] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,460] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 3: [2022-11-28 15:06:36,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 15:06:36,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:06:36,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 6: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 2: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 4: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:06:36,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 1: [2022-11-28 15:06:36,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:06:36,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 15:06:36,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 5: [2022-11-28 15:06:36,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 7: [2022-11-28 15:06:36,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: [2022-11-28 15:06:36,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step18000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:06:36,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step18000 is ready now! 0: successfully saved checkpoint at iteration 18000 to checkpoints_221m 7: time (ms) | save-checkpoint: 744.03 7: iteration 18010/ 115203 | consumed samples: 4610560 | consumed tokens: 9442426880 | elapsed time per iteration (s): 0.51 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.459647E+00 | grad norm: 0.328 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.850 | TFLOPs: 26.23 | 7: iteration 18020/ 115203 | consumed samples: 4613120 | consumed tokens: 9447669760 | elapsed time per iteration (s): 0.44 | learning rate: 1.905E-04 | global batch size: 256 | lm loss: 2.449591E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.830 | TFLOPs: 30.63 | 7: iteration 18030/ 115203 | consumed samples: 4615680 | consumed tokens: 9452912640 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.418930E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.421 | TFLOPs: 30.82 | 7: iteration 18040/ 115203 | consumed samples: 4618240 | consumed tokens: 9458155520 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.441408E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.256 | TFLOPs: 30.50 | 7: iteration 18050/ 115203 | consumed samples: 4620800 | consumed tokens: 9463398400 | elapsed time per iteration (s): 0.42 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.441167E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.409 | TFLOPs: 31.61 | 7: iteration 18060/ 115203 | consumed samples: 4623360 | consumed tokens: 9468641280 | elapsed time per iteration (s): 0.45 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.414199E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.029 | TFLOPs: 30.12 | 7: iteration 18070/ 115203 | consumed samples: 4625920 | consumed tokens: 9473884160 | elapsed time per iteration (s): 0.43 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.453873E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.287 | TFLOPs: 31.02 | 7: iteration 18080/ 115203 | consumed samples: 4628480 | consumed tokens: 9479127040 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.422849E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.857 | TFLOPs: 30.21 | 7: iteration 18090/ 115203 | consumed samples: 4631040 | consumed tokens: 9484369920 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.452642E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.985 | TFLOPs: 30.48 | 7: iteration 18100/ 115203 | consumed samples: 4633600 | consumed tokens: 9489612800 | elapsed time per iteration (s): 0.45 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.435794E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.813 | TFLOPs: 30.11 | 7: iteration 18110/ 115203 | consumed samples: 4636160 | consumed tokens: 9494855680 | elapsed time per iteration (s): 0.44 | learning rate: 1.904E-04 | global batch size: 256 | lm loss: 2.466511E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.084 | TFLOPs: 30.86 | 7: iteration 18120/ 115203 | consumed samples: 4638720 | consumed tokens: 9500098560 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.466228E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.171 | TFLOPs: 30.86 | 7: iteration 18130/ 115203 | consumed samples: 4641280 | consumed tokens: 9505341440 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.453732E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.407 | TFLOPs: 31.40 | 7: iteration 18140/ 115203 | consumed samples: 4643840 | consumed tokens: 9510584320 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.418498E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.883 | TFLOPs: 31.47 | 7: iteration 18150/ 115203 | consumed samples: 4646400 | consumed tokens: 9515827200 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.408159E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.392 | TFLOPs: 30.35 | 7: iteration 18160/ 115203 | consumed samples: 4648960 | consumed tokens: 9521070080 | elapsed time per iteration (s): 0.42 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.394786E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.499 | TFLOPs: 31.66 | 7: iteration 18170/ 115203 | consumed samples: 4651520 | consumed tokens: 9526312960 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.416541E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.444 | TFLOPs: 30.98 | 7: iteration 18180/ 115203 | consumed samples: 4654080 | consumed tokens: 9531555840 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.423291E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.367 | TFLOPs: 30.98 | 7: iteration 18190/ 115203 | consumed samples: 4656640 | consumed tokens: 9536798720 | elapsed time per iteration (s): 0.43 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.423167E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.387 | TFLOPs: 30.92 | 7: iteration 18200/ 115203 | consumed samples: 4659200 | consumed tokens: 9542041600 | elapsed time per iteration (s): 0.44 | learning rate: 1.903E-04 | global batch size: 256 | lm loss: 2.455692E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.071 | TFLOPs: 30.28 | 7: iteration 18210/ 115203 | consumed samples: 4661760 | consumed tokens: 9547284480 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.399042E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.612 | TFLOPs: 30.94 | 7: iteration 18220/ 115203 | consumed samples: 4664320 | consumed tokens: 9552527360 | elapsed time per iteration (s): 0.42 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.464936E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.227 | TFLOPs: 32.07 | 7: iteration 18230/ 115203 | consumed samples: 4666880 | consumed tokens: 9557770240 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.419627E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.788 | TFLOPs: 31.47 | 7: iteration 18240/ 115203 | consumed samples: 4669440 | consumed tokens: 9563013120 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.413237E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.205 | TFLOPs: 30.18 | 7: iteration 18250/ 115203 | consumed samples: 4672000 | consumed tokens: 9568256000 | elapsed time per iteration (s): 0.42 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.436391E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.190 | TFLOPs: 31.65 | 7: iteration 18260/ 115203 | consumed samples: 4674560 | consumed tokens: 9573498880 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.397746E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.995 | TFLOPs: 31.11 | 7: iteration 18270/ 115203 | consumed samples: 4677120 | consumed tokens: 9578741760 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.420022E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.692 | TFLOPs: 31.15 | 7: iteration 18280/ 115203 | consumed samples: 4679680 | consumed tokens: 9583984640 | elapsed time per iteration (s): 0.45 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.450103E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.376 | TFLOPs: 29.98 | 7: iteration 18290/ 115203 | consumed samples: 4682240 | consumed tokens: 9589227520 | elapsed time per iteration (s): 0.43 | learning rate: 1.902E-04 | global batch size: 256 | lm loss: 2.432626E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.675 | TFLOPs: 31.36 | 7: iteration 18300/ 115203 | consumed samples: 4684800 | consumed tokens: 9594470400 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.454596E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.249 | TFLOPs: 31.13 | 7: iteration 18310/ 115203 | consumed samples: 4687360 | consumed tokens: 9599713280 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.466530E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.896 | TFLOPs: 30.58 | 7: iteration 18320/ 115203 | consumed samples: 4689920 | consumed tokens: 9604956160 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.412983E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.260 | TFLOPs: 31.28 | 7: iteration 18330/ 115203 | consumed samples: 4692480 | consumed tokens: 9610199040 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.436072E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.097 | TFLOPs: 30.59 | 7: iteration 18340/ 115203 | consumed samples: 4695040 | consumed tokens: 9615441920 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.436936E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.317 | TFLOPs: 31.34 | 7: iteration 18350/ 115203 | consumed samples: 4697600 | consumed tokens: 9620684800 | elapsed time per iteration (s): 0.44 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.444593E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.627 | TFLOPs: 30.46 | 7: iteration 18360/ 115203 | consumed samples: 4700160 | consumed tokens: 9625927680 | elapsed time per iteration (s): 0.45 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.408721E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.023 | TFLOPs: 29.96 | 7: iteration 18370/ 115203 | consumed samples: 4702720 | consumed tokens: 9631170560 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.469712E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.403 | TFLOPs: 31.03 | 7: iteration 18380/ 115203 | consumed samples: 4705280 | consumed tokens: 9636413440 | elapsed time per iteration (s): 0.43 | learning rate: 1.901E-04 | global batch size: 256 | lm loss: 2.437638E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.570 | TFLOPs: 31.20 | 7: iteration 18390/ 115203 | consumed samples: 4707840 | consumed tokens: 9641656320 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.456124E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.855 | TFLOPs: 31.05 | 7: iteration 18400/ 115203 | consumed samples: 4710400 | consumed tokens: 9646899200 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.427527E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.577 | TFLOPs: 31.04 | 7: iteration 18410/ 115203 | consumed samples: 4712960 | consumed tokens: 9652142080 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.404060E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.007 | TFLOPs: 30.59 | 7: iteration 18420/ 115203 | consumed samples: 4715520 | consumed tokens: 9657384960 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.431248E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.297 | TFLOPs: 31.44 | 7: iteration 18430/ 115203 | consumed samples: 4718080 | consumed tokens: 9662627840 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.446477E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.316 | TFLOPs: 30.97 | 7: iteration 18440/ 115203 | consumed samples: 4720640 | consumed tokens: 9667870720 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.455962E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.192 | TFLOPs: 30.39 | 7: iteration 18450/ 115203 | consumed samples: 4723200 | consumed tokens: 9673113600 | elapsed time per iteration (s): 0.43 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.415369E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.048 | TFLOPs: 30.96 | 7: iteration 18460/ 115203 | consumed samples: 4725760 | consumed tokens: 9678356480 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.404765E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.433 | TFLOPs: 30.66 | 7: iteration 18470/ 115203 | consumed samples: 4728320 | consumed tokens: 9683599360 | elapsed time per iteration (s): 0.44 | learning rate: 1.900E-04 | global batch size: 256 | lm loss: 2.422753E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.466 | TFLOPs: 30.88 | 7: iteration 18480/ 115203 | consumed samples: 4730880 | consumed tokens: 9688842240 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.414081E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.022 | TFLOPs: 31.01 | 7: iteration 18490/ 115203 | consumed samples: 4733440 | consumed tokens: 9694085120 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.428288E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.258 | TFLOPs: 31.13 | 7: iteration 18500/ 115203 | consumed samples: 4736000 | consumed tokens: 9699328000 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.437709E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.110 | TFLOPs: 31.01 | 7: iteration 18510/ 115203 | consumed samples: 4738560 | consumed tokens: 9704570880 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.426792E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.035 | TFLOPs: 31.17 | 7: iteration 18520/ 115203 | consumed samples: 4741120 | consumed tokens: 9709813760 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.401616E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.858 | TFLOPs: 31.21 | 7: iteration 18530/ 115203 | consumed samples: 4743680 | consumed tokens: 9715056640 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.431977E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.278 | TFLOPs: 31.23 | 7: iteration 18540/ 115203 | consumed samples: 4746240 | consumed tokens: 9720299520 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.448318E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.389 | TFLOPs: 30.92 | 7: iteration 18550/ 115203 | consumed samples: 4748800 | consumed tokens: 9725542400 | elapsed time per iteration (s): 0.43 | learning rate: 1.899E-04 | global batch size: 256 | lm loss: 2.450129E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.307 | TFLOPs: 30.97 | 7: iteration 18560/ 115203 | consumed samples: 4751360 | consumed tokens: 9730785280 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.443215E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.833 | TFLOPs: 30.11 | 7: iteration 18570/ 115203 | consumed samples: 4753920 | consumed tokens: 9736028160 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.406816E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.047 | TFLOPs: 29.70 | 7: iteration 18580/ 115203 | consumed samples: 4756480 | consumed tokens: 9741271040 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.448609E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.962 | TFLOPs: 30.95 | 7: iteration 18590/ 115203 | consumed samples: 4759040 | consumed tokens: 9746513920 | elapsed time per iteration (s): 0.44 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.419934E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.325 | TFLOPs: 30.61 | 7: iteration 18600/ 115203 | consumed samples: 4761600 | consumed tokens: 9751756800 | elapsed time per iteration (s): 0.44 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.421598E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.425 | TFLOPs: 30.40 | 7: iteration 18610/ 115203 | consumed samples: 4764160 | consumed tokens: 9756999680 | elapsed time per iteration (s): 0.43 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.441595E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.023 | TFLOPs: 31.11 | 7: iteration 18620/ 115203 | consumed samples: 4766720 | consumed tokens: 9762242560 | elapsed time per iteration (s): 0.44 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.433442E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.343 | TFLOPs: 30.45 | 7: iteration 18630/ 115203 | consumed samples: 4769280 | consumed tokens: 9767485440 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.441754E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.102 | TFLOPs: 29.96 | 7: iteration 18640/ 115203 | consumed samples: 4771840 | consumed tokens: 9772728320 | elapsed time per iteration (s): 0.45 | learning rate: 1.898E-04 | global batch size: 256 | lm loss: 2.429887E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.501 | TFLOPs: 30.09 | 7: iteration 18650/ 115203 | consumed samples: 4774400 | consumed tokens: 9777971200 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.430487E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.769 | TFLOPs: 30.42 | 7: iteration 18660/ 115203 | consumed samples: 4776960 | consumed tokens: 9783214080 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.454072E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.881 | TFLOPs: 30.43 | 7: iteration 18670/ 115203 | consumed samples: 4779520 | consumed tokens: 9788456960 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.410610E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.010 | TFLOPs: 30.54 | 7: iteration 18680/ 115203 | consumed samples: 4782080 | consumed tokens: 9793699840 | elapsed time per iteration (s): 0.45 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.479251E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.867 | TFLOPs: 30.06 | 7: iteration 18690/ 115203 | consumed samples: 4784640 | consumed tokens: 9798942720 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.427756E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.646 | TFLOPs: 30.31 | 7: iteration 18700/ 115203 | consumed samples: 4787200 | consumed tokens: 9804185600 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.437718E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.979 | TFLOPs: 30.80 | 7: iteration 18710/ 115203 | consumed samples: 4789760 | consumed tokens: 9809428480 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.447153E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.832 | TFLOPs: 31.16 | 7: iteration 18720/ 115203 | consumed samples: 4792320 | consumed tokens: 9814671360 | elapsed time per iteration (s): 0.44 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.434109E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.266 | TFLOPs: 30.76 | 7: iteration 18730/ 115203 | consumed samples: 4794880 | consumed tokens: 9819914240 | elapsed time per iteration (s): 0.43 | learning rate: 1.897E-04 | global batch size: 256 | lm loss: 2.430565E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.796 | TFLOPs: 30.95 | 7: iteration 18740/ 115203 | consumed samples: 4797440 | consumed tokens: 9825157120 | elapsed time per iteration (s): 0.45 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.436730E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.354 | TFLOPs: 29.66 | 7: iteration 18750/ 115203 | consumed samples: 4800000 | consumed tokens: 9830400000 | elapsed time per iteration (s): 0.46 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.426740E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.408 | TFLOPs: 29.09 | 7: iteration 18760/ 115203 | consumed samples: 4802560 | consumed tokens: 9835642880 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.448931E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.220 | TFLOPs: 31.18 | 7: iteration 18770/ 115203 | consumed samples: 4805120 | consumed tokens: 9840885760 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.389381E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.163 | TFLOPs: 31.28 | 7: iteration 18780/ 115203 | consumed samples: 4807680 | consumed tokens: 9846128640 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.453696E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.943 | TFLOPs: 30.90 | 7: iteration 18790/ 115203 | consumed samples: 4810240 | consumed tokens: 9851371520 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.428486E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.426 | TFLOPs: 31.24 | 7: iteration 18800/ 115203 | consumed samples: 4812800 | consumed tokens: 9856614400 | elapsed time per iteration (s): 0.44 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.435388E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.016 | TFLOPs: 30.85 | 7: iteration 18810/ 115203 | consumed samples: 4815360 | consumed tokens: 9861857280 | elapsed time per iteration (s): 0.44 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.396250E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.292 | TFLOPs: 30.81 | 7: iteration 18820/ 115203 | consumed samples: 4817920 | consumed tokens: 9867100160 | elapsed time per iteration (s): 0.43 | learning rate: 1.896E-04 | global batch size: 256 | lm loss: 2.444071E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.252 | TFLOPs: 31.02 | 7: iteration 18830/ 115203 | consumed samples: 4820480 | consumed tokens: 9872343040 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.435470E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.903 | TFLOPs: 30.90 | 7: iteration 18840/ 115203 | consumed samples: 4823040 | consumed tokens: 9877585920 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.395753E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.750 | TFLOPs: 31.31 | 7: iteration 18850/ 115203 | consumed samples: 4825600 | consumed tokens: 9882828800 | elapsed time per iteration (s): 0.44 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.428343E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.471 | TFLOPs: 30.40 | 7: iteration 18860/ 115203 | consumed samples: 4828160 | consumed tokens: 9888071680 | elapsed time per iteration (s): 0.54 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.425336E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 477.151 | TFLOPs: 25.04 | 7: iteration 18870/ 115203 | consumed samples: 4830720 | consumed tokens: 9893314560 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.433825E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.141 | TFLOPs: 31.02 | 7: iteration 18880/ 115203 | consumed samples: 4833280 | consumed tokens: 9898557440 | elapsed time per iteration (s): 0.43 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.429563E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.451 | TFLOPs: 31.24 | 7: iteration 18890/ 115203 | consumed samples: 4835840 | consumed tokens: 9903800320 | elapsed time per iteration (s): 0.44 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.451611E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.707 | TFLOPs: 30.63 | 7: iteration 18900/ 115203 | consumed samples: 4838400 | consumed tokens: 9909043200 | elapsed time per iteration (s): 0.44 | learning rate: 1.895E-04 | global batch size: 256 | lm loss: 2.419364E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.811 | TFLOPs: 30.84 | 7: iteration 18910/ 115203 | consumed samples: 4840960 | consumed tokens: 9914286080 | elapsed time per iteration (s): 0.45 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.433841E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.604 | TFLOPs: 30.10 | 7: iteration 18920/ 115203 | consumed samples: 4843520 | consumed tokens: 9919528960 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.407820E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.491 | TFLOPs: 30.67 | 7: iteration 18930/ 115203 | consumed samples: 4846080 | consumed tokens: 9924771840 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.422628E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.578 | TFLOPs: 30.36 | 7: iteration 18940/ 115203 | consumed samples: 4848640 | consumed tokens: 9930014720 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.467965E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.628 | TFLOPs: 30.46 | 7: iteration 18950/ 115203 | consumed samples: 4851200 | consumed tokens: 9935257600 | elapsed time per iteration (s): 0.43 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.425465E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.011 | TFLOPs: 30.90 | 7: iteration 18960/ 115203 | consumed samples: 4853760 | consumed tokens: 9940500480 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.397631E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.866 | TFLOPs: 30.63 | 7: iteration 18970/ 115203 | consumed samples: 4856320 | consumed tokens: 9945743360 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.450487E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.370 | TFLOPs: 30.61 | 7: iteration 18980/ 115203 | consumed samples: 4858880 | consumed tokens: 9950986240 | elapsed time per iteration (s): 0.42 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.448486E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.009 | TFLOPs: 31.69 | 7: iteration 18990/ 115203 | consumed samples: 4861440 | consumed tokens: 9956229120 | elapsed time per iteration (s): 0.44 | learning rate: 1.894E-04 | global batch size: 256 | lm loss: 2.438642E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.583 | TFLOPs: 30.83 | 7: iteration 19000/ 115203 | consumed samples: 4864000 | consumed tokens: 9961472000 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.452583E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.910 | TFLOPs: 31.06 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 19000 | lm loss value: 2.350071E+00 | lm loss PPL: 1.048631E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 19000 to checkpoints_221m 0: [2022-11-28 15:13:53,895] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step19000 is begin to save! 0: [2022-11-28 15:13:53,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:13:54,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:13:54,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:13:54,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:13:54,043] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:13:54,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:13:54,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:13:54,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:13:54,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:13:54,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:13:54,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:13:54,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:13:54,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:13:54,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:13:54,170] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:13:54,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:13:54,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:13:54,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:13:54,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:13:54,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:13:54,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:13:54,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:13:54,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:13:54,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:13:54,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:13:54,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:13:54,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:13:54,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:13:54,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:13:54,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:13:54,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:13:54,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:13:54,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:13:54,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:13:54,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:13:54,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:13:54,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:13:54,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:13:54,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:13:54,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:13:54,468] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step19000/mp_rank_00_model_states.pt 0: [2022-11-28 15:13:54,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:13:54,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:13:54,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 2: [2022-11-28 15:13:54,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:13:54,559] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:13:54,559] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 7: [2022-11-28 15:13:54,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:13:54,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:13:54,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 4: [2022-11-28 15:13:54,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 5: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:13:54,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 15:13:54,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 15:13:54,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 3: [2022-11-28 15:13:54,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: [2022-11-28 15:13:54,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:13:54,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 6: [2022-11-28 15:13:54,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:13:54,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:13:54,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step19000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 1: [2022-11-28 15:13:54,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step19000 is ready now! 0: successfully saved checkpoint at iteration 19000 to checkpoints_221m 7: time (ms) | save-checkpoint: 936.72 7: iteration 19010/ 115203 | consumed samples: 4866560 | consumed tokens: 9966714880 | elapsed time per iteration (s): 0.54 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.454323E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 470.538 | TFLOPs: 24.69 | 7: iteration 19020/ 115203 | consumed samples: 4869120 | consumed tokens: 9971957760 | elapsed time per iteration (s): 0.44 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.446387E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.036 | TFLOPs: 30.28 | 7: iteration 19030/ 115203 | consumed samples: 4871680 | consumed tokens: 9977200640 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.413183E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.689 | TFLOPs: 31.20 | 7: iteration 19040/ 115203 | consumed samples: 4874240 | consumed tokens: 9982443520 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.449487E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.092 | TFLOPs: 31.01 | 7: iteration 19050/ 115203 | consumed samples: 4876800 | consumed tokens: 9987686400 | elapsed time per iteration (s): 0.43 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.415661E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.835 | TFLOPs: 31.16 | 7: iteration 19060/ 115203 | consumed samples: 4879360 | consumed tokens: 9992929280 | elapsed time per iteration (s): 0.44 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.416744E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.581 | TFLOPs: 30.67 | 7: iteration 19070/ 115203 | consumed samples: 4881920 | consumed tokens: 9998172160 | elapsed time per iteration (s): 0.45 | learning rate: 1.893E-04 | global batch size: 256 | lm loss: 2.404859E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.089 | TFLOPs: 29.60 | 7: iteration 19080/ 115203 | consumed samples: 4884480 | consumed tokens: 10003415040 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.436506E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.604 | TFLOPs: 30.99 | 7: iteration 19090/ 115203 | consumed samples: 4887040 | consumed tokens: 10008657920 | elapsed time per iteration (s): 0.44 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.454230E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.317 | TFLOPs: 30.66 | 7: iteration 19100/ 115203 | consumed samples: 4889600 | consumed tokens: 10013900800 | elapsed time per iteration (s): 0.44 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.459105E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.964 | TFLOPs: 30.64 | 7: iteration 19110/ 115203 | consumed samples: 4892160 | consumed tokens: 10019143680 | elapsed time per iteration (s): 0.45 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.460950E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.648 | TFLOPs: 29.89 | 7: iteration 19120/ 115203 | consumed samples: 4894720 | consumed tokens: 10024386560 | elapsed time per iteration (s): 0.44 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.423379E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.768 | TFLOPs: 30.68 | 7: iteration 19130/ 115203 | consumed samples: 4897280 | consumed tokens: 10029629440 | elapsed time per iteration (s): 0.43 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.416595E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.910 | TFLOPs: 30.90 | 7: iteration 19140/ 115203 | consumed samples: 4899840 | consumed tokens: 10034872320 | elapsed time per iteration (s): 0.45 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.420752E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.942 | TFLOPs: 29.90 | 7: iteration 19150/ 115203 | consumed samples: 4902400 | consumed tokens: 10040115200 | elapsed time per iteration (s): 0.45 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.413678E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.119 | TFLOPs: 29.55 | 7: iteration 19160/ 115203 | consumed samples: 4904960 | consumed tokens: 10045358080 | elapsed time per iteration (s): 0.44 | learning rate: 1.892E-04 | global batch size: 256 | lm loss: 2.439037E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.606 | TFLOPs: 30.31 | 7: iteration 19170/ 115203 | consumed samples: 4907520 | consumed tokens: 10050600960 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.417019E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.173 | TFLOPs: 31.39 | 7: iteration 19180/ 115203 | consumed samples: 4910080 | consumed tokens: 10055843840 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.434539E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.904 | TFLOPs: 31.21 | 7: iteration 19190/ 115203 | consumed samples: 4912640 | consumed tokens: 10061086720 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.434014E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.187 | TFLOPs: 31.39 | 7: iteration 19200/ 115203 | consumed samples: 4915200 | consumed tokens: 10066329600 | elapsed time per iteration (s): 0.44 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.435539E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.158 | TFLOPs: 30.65 | 7: iteration 19210/ 115203 | consumed samples: 4917760 | consumed tokens: 10071572480 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.425563E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.325 | TFLOPs: 31.08 | 7: iteration 19220/ 115203 | consumed samples: 4920320 | consumed tokens: 10076815360 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.429760E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.937 | TFLOPs: 31.01 | 7: iteration 19230/ 115203 | consumed samples: 4922880 | consumed tokens: 10082058240 | elapsed time per iteration (s): 0.43 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.430077E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.454 | TFLOPs: 31.19 | 7: iteration 19240/ 115203 | consumed samples: 4925440 | consumed tokens: 10087301120 | elapsed time per iteration (s): 0.44 | learning rate: 1.891E-04 | global batch size: 256 | lm loss: 2.433499E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.833 | TFLOPs: 30.84 | 7: iteration 19250/ 115203 | consumed samples: 4928000 | consumed tokens: 10092544000 | elapsed time per iteration (s): 0.45 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.458727E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.562 | TFLOPs: 30.04 | 7: iteration 19260/ 115203 | consumed samples: 4930560 | consumed tokens: 10097786880 | elapsed time per iteration (s): 0.44 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.420527E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.721 | TFLOPs: 30.78 | 7: iteration 19270/ 115203 | consumed samples: 4933120 | consumed tokens: 10103029760 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.401073E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.186 | TFLOPs: 31.12 | 7: iteration 19280/ 115203 | consumed samples: 4935680 | consumed tokens: 10108272640 | elapsed time per iteration (s): 0.44 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.420689E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.072 | TFLOPs: 30.49 | 7: iteration 19290/ 115203 | consumed samples: 4938240 | consumed tokens: 10113515520 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.438048E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.887 | TFLOPs: 31.21 | 7: iteration 19300/ 115203 | consumed samples: 4940800 | consumed tokens: 10118758400 | elapsed time per iteration (s): 0.44 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.425453E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.777 | TFLOPs: 30.63 | 7: iteration 19310/ 115203 | consumed samples: 4943360 | consumed tokens: 10124001280 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.419226E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.455 | TFLOPs: 31.03 | 7: iteration 19320/ 115203 | consumed samples: 4945920 | consumed tokens: 10129244160 | elapsed time per iteration (s): 0.43 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.406660E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.982 | TFLOPs: 31.38 | 7: iteration 19330/ 115203 | consumed samples: 4948480 | consumed tokens: 10134487040 | elapsed time per iteration (s): 0.44 | learning rate: 1.890E-04 | global batch size: 256 | lm loss: 2.401554E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.288 | TFLOPs: 30.66 | 7: iteration 19340/ 115203 | consumed samples: 4951040 | consumed tokens: 10139729920 | elapsed time per iteration (s): 0.44 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.429886E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.344 | TFLOPs: 30.76 | 7: iteration 19350/ 115203 | consumed samples: 4953600 | consumed tokens: 10144972800 | elapsed time per iteration (s): 0.42 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.420548E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.534 | TFLOPs: 31.72 | 7: iteration 19360/ 115203 | consumed samples: 4956160 | consumed tokens: 10150215680 | elapsed time per iteration (s): 0.42 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.416161E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.731 | TFLOPs: 31.94 | 7: iteration 19370/ 115203 | consumed samples: 4958720 | consumed tokens: 10155458560 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.418419E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.773 | TFLOPs: 31.42 | 7: iteration 19380/ 115203 | consumed samples: 4961280 | consumed tokens: 10160701440 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.399401E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.324 | TFLOPs: 31.45 | 7: iteration 19390/ 115203 | consumed samples: 4963840 | consumed tokens: 10165944320 | elapsed time per iteration (s): 0.45 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.428027E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.632 | TFLOPs: 30.10 | 7: iteration 19400/ 115203 | consumed samples: 4966400 | consumed tokens: 10171187200 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.428852E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.149 | TFLOPs: 30.91 | 7: iteration 19410/ 115203 | consumed samples: 4968960 | consumed tokens: 10176430080 | elapsed time per iteration (s): 0.43 | learning rate: 1.889E-04 | global batch size: 256 | lm loss: 2.431630E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.952 | TFLOPs: 31.27 | 7: iteration 19420/ 115203 | consumed samples: 4971520 | consumed tokens: 10181672960 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.436499E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.681 | TFLOPs: 31.31 | 7: iteration 19430/ 115203 | consumed samples: 4974080 | consumed tokens: 10186915840 | elapsed time per iteration (s): 0.42 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.449796E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.469 | TFLOPs: 31.72 | 7: iteration 19440/ 115203 | consumed samples: 4976640 | consumed tokens: 10192158720 | elapsed time per iteration (s): 0.44 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.430148E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.855 | TFLOPs: 30.32 | 7: iteration 19450/ 115203 | consumed samples: 4979200 | consumed tokens: 10197401600 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.444728E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.765 | TFLOPs: 31.00 | 7: iteration 19460/ 115203 | consumed samples: 4981760 | consumed tokens: 10202644480 | elapsed time per iteration (s): 0.43 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.417050E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.832 | TFLOPs: 31.31 | 7: iteration 19470/ 115203 | consumed samples: 4984320 | consumed tokens: 10207887360 | elapsed time per iteration (s): 0.44 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.403163E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.519 | TFLOPs: 30.46 | 7: iteration 19480/ 115203 | consumed samples: 4986880 | consumed tokens: 10213130240 | elapsed time per iteration (s): 0.46 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.410428E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.748 | TFLOPs: 29.26 | 7: iteration 19490/ 115203 | consumed samples: 4989440 | consumed tokens: 10218373120 | elapsed time per iteration (s): 0.44 | learning rate: 1.888E-04 | global batch size: 256 | lm loss: 2.430706E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.023 | TFLOPs: 30.33 | 7: iteration 19500/ 115203 | consumed samples: 4992000 | consumed tokens: 10223616000 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.412158E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.867 | TFLOPs: 30.95 | 7: iteration 19510/ 115203 | consumed samples: 4994560 | consumed tokens: 10228858880 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.439757E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.906 | TFLOPs: 31.42 | 7: iteration 19520/ 115203 | consumed samples: 4997120 | consumed tokens: 10234101760 | elapsed time per iteration (s): 0.42 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.438614E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.650 | TFLOPs: 31.62 | 7: iteration 19530/ 115203 | consumed samples: 4999680 | consumed tokens: 10239344640 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.407839E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.403 | TFLOPs: 30.93 | 7: iteration 19540/ 115203 | consumed samples: 5002240 | consumed tokens: 10244587520 | elapsed time per iteration (s): 0.44 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.431351E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.076 | TFLOPs: 30.54 | 7: iteration 19550/ 115203 | consumed samples: 5004800 | consumed tokens: 10249830400 | elapsed time per iteration (s): 0.46 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.453754E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.900 | TFLOPs: 29.48 | 7: iteration 19560/ 115203 | consumed samples: 5007360 | consumed tokens: 10255073280 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.417182E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.587 | TFLOPs: 31.04 | 7: iteration 19570/ 115203 | consumed samples: 5009920 | consumed tokens: 10260316160 | elapsed time per iteration (s): 0.44 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.406282E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.074 | TFLOPs: 30.28 | 7: iteration 19580/ 115203 | consumed samples: 5012480 | consumed tokens: 10265559040 | elapsed time per iteration (s): 0.43 | learning rate: 1.887E-04 | global batch size: 256 | lm loss: 2.431615E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.185 | TFLOPs: 31.28 | 7: iteration 19590/ 115203 | consumed samples: 5015040 | consumed tokens: 10270801920 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.402084E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.793 | TFLOPs: 31.37 | 7: iteration 19600/ 115203 | consumed samples: 5017600 | consumed tokens: 10276044800 | elapsed time per iteration (s): 0.45 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.397307E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.019 | TFLOPs: 30.01 | 7: iteration 19610/ 115203 | consumed samples: 5020160 | consumed tokens: 10281287680 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.429937E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.663 | TFLOPs: 30.99 | 7: iteration 19620/ 115203 | consumed samples: 5022720 | consumed tokens: 10286530560 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.438055E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.896 | TFLOPs: 31.11 | 7: iteration 19630/ 115203 | consumed samples: 5025280 | consumed tokens: 10291773440 | elapsed time per iteration (s): 0.44 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.437934E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.756 | TFLOPs: 30.26 | 7: iteration 19640/ 115203 | consumed samples: 5027840 | consumed tokens: 10297016320 | elapsed time per iteration (s): 0.44 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.449347E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.551 | TFLOPs: 30.57 | 7: iteration 19650/ 115203 | consumed samples: 5030400 | consumed tokens: 10302259200 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.395496E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.500 | TFLOPs: 31.04 | 7: iteration 19660/ 115203 | consumed samples: 5032960 | consumed tokens: 10307502080 | elapsed time per iteration (s): 0.43 | learning rate: 1.886E-04 | global batch size: 256 | lm loss: 2.427668E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.637 | TFLOPs: 31.09 | 7: iteration 19670/ 115203 | consumed samples: 5035520 | consumed tokens: 10312744960 | elapsed time per iteration (s): 0.42 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.410771E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.704 | TFLOPs: 31.68 | 7: iteration 19680/ 115203 | consumed samples: 5038080 | consumed tokens: 10317987840 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.401664E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.056 | TFLOPs: 31.48 | 7: iteration 19690/ 115203 | consumed samples: 5040640 | consumed tokens: 10323230720 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.411168E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.407 | TFLOPs: 31.08 | 7: iteration 19700/ 115203 | consumed samples: 5043200 | consumed tokens: 10328473600 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.416860E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.569 | TFLOPs: 31.51 | 7: iteration 19710/ 115203 | consumed samples: 5045760 | consumed tokens: 10333716480 | elapsed time per iteration (s): 0.44 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.409615E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.874 | TFLOPs: 30.63 | 7: iteration 19720/ 115203 | consumed samples: 5048320 | consumed tokens: 10338959360 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.397880E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.751 | TFLOPs: 31.42 | 7: iteration 19730/ 115203 | consumed samples: 5050880 | consumed tokens: 10344202240 | elapsed time per iteration (s): 0.44 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.416307E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.582 | TFLOPs: 30.83 | 7: iteration 19740/ 115203 | consumed samples: 5053440 | consumed tokens: 10349445120 | elapsed time per iteration (s): 0.43 | learning rate: 1.885E-04 | global batch size: 256 | lm loss: 2.448609E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.881 | TFLOPs: 31.32 | 7: iteration 19750/ 115203 | consumed samples: 5056000 | consumed tokens: 10354688000 | elapsed time per iteration (s): 0.42 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.423234E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.421 | TFLOPs: 31.61 | 7: iteration 19760/ 115203 | consumed samples: 5058560 | consumed tokens: 10359930880 | elapsed time per iteration (s): 0.45 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.415864E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.713 | TFLOPs: 30.00 | 7: iteration 19770/ 115203 | consumed samples: 5061120 | consumed tokens: 10365173760 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.423408E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.872 | TFLOPs: 31.47 | 7: iteration 19780/ 115203 | consumed samples: 5063680 | consumed tokens: 10370416640 | elapsed time per iteration (s): 0.45 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.431414E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.034 | TFLOPs: 29.96 | 7: iteration 19790/ 115203 | consumed samples: 5066240 | consumed tokens: 10375659520 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.422051E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.138 | TFLOPs: 31.07 | 7: iteration 19800/ 115203 | consumed samples: 5068800 | consumed tokens: 10380902400 | elapsed time per iteration (s): 0.43 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.393427E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.998 | TFLOPs: 31.43 | 7: iteration 19810/ 115203 | consumed samples: 5071360 | consumed tokens: 10386145280 | elapsed time per iteration (s): 0.42 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.428319E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.263 | TFLOPs: 32.02 | 7: iteration 19820/ 115203 | consumed samples: 5073920 | consumed tokens: 10391388160 | elapsed time per iteration (s): 0.44 | learning rate: 1.884E-04 | global batch size: 256 | lm loss: 2.423240E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.142 | TFLOPs: 30.28 | 7: iteration 19830/ 115203 | consumed samples: 5076480 | consumed tokens: 10396631040 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.394958E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.583 | TFLOPs: 31.20 | 7: iteration 19840/ 115203 | consumed samples: 5079040 | consumed tokens: 10401873920 | elapsed time per iteration (s): 0.58 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.400053E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 442.228 | TFLOPs: 23.20 | 7: iteration 19850/ 115203 | consumed samples: 5081600 | consumed tokens: 10407116800 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.441823E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.568 | TFLOPs: 31.41 | 7: iteration 19860/ 115203 | consumed samples: 5084160 | consumed tokens: 10412359680 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.436789E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.731 | TFLOPs: 30.99 | 7: iteration 19870/ 115203 | consumed samples: 5086720 | consumed tokens: 10417602560 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.445398E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.093 | TFLOPs: 31.43 | 7: iteration 19880/ 115203 | consumed samples: 5089280 | consumed tokens: 10422845440 | elapsed time per iteration (s): 0.42 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.421826E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.116 | TFLOPs: 32.12 | 7: iteration 19890/ 115203 | consumed samples: 5091840 | consumed tokens: 10428088320 | elapsed time per iteration (s): 0.43 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.421494E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.985 | TFLOPs: 31.06 | 7: iteration 19900/ 115203 | consumed samples: 5094400 | consumed tokens: 10433331200 | elapsed time per iteration (s): 0.44 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.421854E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.843 | TFLOPs: 30.58 | 7: iteration 19910/ 115203 | consumed samples: 5096960 | consumed tokens: 10438574080 | elapsed time per iteration (s): 0.45 | learning rate: 1.883E-04 | global batch size: 256 | lm loss: 2.440433E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.165 | TFLOPs: 29.97 | 7: iteration 19920/ 115203 | consumed samples: 5099520 | consumed tokens: 10443816960 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.449994E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.485 | TFLOPs: 31.35 | 7: iteration 19930/ 115203 | consumed samples: 5102080 | consumed tokens: 10449059840 | elapsed time per iteration (s): 0.44 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.387842E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.905 | TFLOPs: 30.32 | 7: iteration 19940/ 115203 | consumed samples: 5104640 | consumed tokens: 10454302720 | elapsed time per iteration (s): 0.44 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.447768E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.228 | TFLOPs: 30.55 | 7: iteration 19950/ 115203 | consumed samples: 5107200 | consumed tokens: 10459545600 | elapsed time per iteration (s): 0.44 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.396258E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.067 | TFLOPs: 30.38 | 7: iteration 19960/ 115203 | consumed samples: 5109760 | consumed tokens: 10464788480 | elapsed time per iteration (s): 0.42 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.394703E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.880 | TFLOPs: 31.63 | 7: iteration 19970/ 115203 | consumed samples: 5112320 | consumed tokens: 10470031360 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.434732E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.755 | TFLOPs: 30.89 | 7: iteration 19980/ 115203 | consumed samples: 5114880 | consumed tokens: 10475274240 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.439275E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.625 | TFLOPs: 30.99 | 7: iteration 19990/ 115203 | consumed samples: 5117440 | consumed tokens: 10480517120 | elapsed time per iteration (s): 0.43 | learning rate: 1.882E-04 | global batch size: 256 | lm loss: 2.433877E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.559 | TFLOPs: 31.46 | 0: [2022-11-28 15:21:11,102] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=0, lr=[0.00018814068619753637, 0.00018814068619753637, 0.00018814068619753637], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 20000/ 115203 | consumed samples: 5120000 | consumed tokens: 10485760000 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.452776E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.384 | TFLOPs: 31.03 | 0: steps: 20000 loss: 2.4166 iter time (s): 0.435 samples/sec: 588.936 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 20000 | lm loss value: 2.312221E+00 | lm loss PPL: 1.009683E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 20000 to checkpoints_221m 0: [2022-11-28 15:21:11,296] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step20000 is begin to save! 0: [2022-11-28 15:21:11,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:21:11,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:21:11,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:21:11,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:21:11,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:21:11,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:21:11,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:21:11,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:21:11,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:21:11,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:21:11,526] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:21:11,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:21:11,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:21:11,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:21:11,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:21:11,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:21:11,599] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:21:11,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:21:11,623] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:21:11,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:21:11,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:21:11,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:21:11,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:21:11,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:21:11,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:21:11,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:21:11,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:21:11,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:21:11,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:21:11,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:21:11,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:21:11,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:21:11,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:21:11,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:21:11,827] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:21:11,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:21:11,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:21:11,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:21:11,876] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:21:11,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:21:11,881] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step20000/mp_rank_00_model_states.pt 0: [2022-11-28 15:21:11,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:21:11,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:21:11,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,955] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,955] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,951] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,957] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,957] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 1: [2022-11-28 15:21:11,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 15:21:11,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 3: [2022-11-28 15:21:11,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:11,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:11,964] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:11,964] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,968] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,968] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:21:11,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 6: [2022-11-28 15:21:11,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 5: [2022-11-28 15:21:11,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:21:11,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 15:21:11,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 7: [2022-11-28 15:21:11,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:21:11,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:21:11,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: [2022-11-28 15:21:12,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:21:12,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:21:12,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:21:12,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 4: [2022-11-28 15:21:12,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:21:12,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step20000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:21:12,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step20000 is ready now! 0: successfully saved checkpoint at iteration 20000 to checkpoints_221m 7: time (ms) | save-checkpoint: 800.78 7: iteration 20010/ 115203 | consumed samples: 5122560 | consumed tokens: 10491002880 | elapsed time per iteration (s): 0.53 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.426167E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 482.033 | TFLOPs: 25.29 | 7: iteration 20020/ 115203 | consumed samples: 5125120 | consumed tokens: 10496245760 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.400772E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.732 | TFLOPs: 30.89 | 7: iteration 20030/ 115203 | consumed samples: 5127680 | consumed tokens: 10501488640 | elapsed time per iteration (s): 0.44 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.428799E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.375 | TFLOPs: 30.61 | 7: iteration 20040/ 115203 | consumed samples: 5130240 | consumed tokens: 10506731520 | elapsed time per iteration (s): 0.43 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.431821E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.990 | TFLOPs: 31.43 | 7: iteration 20050/ 115203 | consumed samples: 5132800 | consumed tokens: 10511974400 | elapsed time per iteration (s): 0.44 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.407747E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.119 | TFLOPs: 30.23 | 7: iteration 20060/ 115203 | consumed samples: 5135360 | consumed tokens: 10517217280 | elapsed time per iteration (s): 0.45 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.427056E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.258 | TFLOPs: 29.92 | 7: iteration 20070/ 115203 | consumed samples: 5137920 | consumed tokens: 10522460160 | elapsed time per iteration (s): 0.45 | learning rate: 1.881E-04 | global batch size: 256 | lm loss: 2.392396E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.432 | TFLOPs: 30.09 | 7: iteration 20080/ 115203 | consumed samples: 5140480 | consumed tokens: 10527703040 | elapsed time per iteration (s): 0.44 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.406074E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.173 | TFLOPs: 30.81 | 7: iteration 20090/ 115203 | consumed samples: 5143040 | consumed tokens: 10532945920 | elapsed time per iteration (s): 0.44 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.377053E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.304 | TFLOPs: 30.61 | 7: iteration 20100/ 115203 | consumed samples: 5145600 | consumed tokens: 10538188800 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.438919E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.803 | TFLOPs: 31.26 | 7: iteration 20110/ 115203 | consumed samples: 5148160 | consumed tokens: 10543431680 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.396413E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.530 | TFLOPs: 31.51 | 7: iteration 20120/ 115203 | consumed samples: 5150720 | consumed tokens: 10548674560 | elapsed time per iteration (s): 0.44 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.407698E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.420 | TFLOPs: 30.66 | 7: iteration 20130/ 115203 | consumed samples: 5153280 | consumed tokens: 10553917440 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.419474E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.982 | TFLOPs: 31.17 | 7: iteration 20140/ 115203 | consumed samples: 5155840 | consumed tokens: 10559160320 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.411039E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.852 | TFLOPs: 31.26 | 7: iteration 20150/ 115203 | consumed samples: 5158400 | consumed tokens: 10564403200 | elapsed time per iteration (s): 0.43 | learning rate: 1.880E-04 | global batch size: 256 | lm loss: 2.427140E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.245 | TFLOPs: 31.39 | 7: iteration 20160/ 115203 | consumed samples: 5160960 | consumed tokens: 10569646080 | elapsed time per iteration (s): 0.44 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.460016E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.350 | TFLOPs: 30.19 | 7: iteration 20170/ 115203 | consumed samples: 5163520 | consumed tokens: 10574888960 | elapsed time per iteration (s): 0.44 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.413768E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.058 | TFLOPs: 30.75 | 7: iteration 20180/ 115203 | consumed samples: 5166080 | consumed tokens: 10580131840 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.421803E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.353 | TFLOPs: 31.18 | 7: iteration 20190/ 115203 | consumed samples: 5168640 | consumed tokens: 10585374720 | elapsed time per iteration (s): 0.44 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.418700E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.530 | TFLOPs: 30.62 | 7: iteration 20200/ 115203 | consumed samples: 5171200 | consumed tokens: 10590617600 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.428410E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.745 | TFLOPs: 31.05 | 7: iteration 20210/ 115203 | consumed samples: 5173760 | consumed tokens: 10595860480 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.375869E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.820 | TFLOPs: 31.31 | 7: iteration 20220/ 115203 | consumed samples: 5176320 | consumed tokens: 10601103360 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.397116E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.149 | TFLOPs: 31.12 | 7: iteration 20230/ 115203 | consumed samples: 5178880 | consumed tokens: 10606346240 | elapsed time per iteration (s): 0.43 | learning rate: 1.879E-04 | global batch size: 256 | lm loss: 2.417688E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.844 | TFLOPs: 30.95 | 7: iteration 20240/ 115203 | consumed samples: 5181440 | consumed tokens: 10611589120 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.415272E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.461 | TFLOPs: 31.03 | 7: iteration 20250/ 115203 | consumed samples: 5184000 | consumed tokens: 10616832000 | elapsed time per iteration (s): 0.42 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.404806E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.858 | TFLOPs: 31.79 | 7: iteration 20260/ 115203 | consumed samples: 5186560 | consumed tokens: 10622074880 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.407889E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.200 | TFLOPs: 31.60 | 7: iteration 20270/ 115203 | consumed samples: 5189120 | consumed tokens: 10627317760 | elapsed time per iteration (s): 0.44 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.415266E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.836 | TFLOPs: 30.74 | 7: iteration 20280/ 115203 | consumed samples: 5191680 | consumed tokens: 10632560640 | elapsed time per iteration (s): 0.44 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.422531E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.354 | TFLOPs: 30.82 | 7: iteration 20290/ 115203 | consumed samples: 5194240 | consumed tokens: 10637803520 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.454927E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.358 | TFLOPs: 31.39 | 7: iteration 20300/ 115203 | consumed samples: 5196800 | consumed tokens: 10643046400 | elapsed time per iteration (s): 0.43 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.401340E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.719 | TFLOPs: 31.15 | 7: iteration 20310/ 115203 | consumed samples: 5199360 | consumed tokens: 10648289280 | elapsed time per iteration (s): 0.42 | learning rate: 1.878E-04 | global batch size: 256 | lm loss: 2.405736E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.883 | TFLOPs: 31.84 | 7: iteration 20320/ 115203 | consumed samples: 5201920 | consumed tokens: 10653532160 | elapsed time per iteration (s): 0.45 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.420000E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.017 | TFLOPs: 29.54 | 7: iteration 20330/ 115203 | consumed samples: 5204480 | consumed tokens: 10658775040 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.438780E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.126 | TFLOPs: 31.23 | 7: iteration 20340/ 115203 | consumed samples: 5207040 | consumed tokens: 10664017920 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.408057E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.776 | TFLOPs: 31.57 | 7: iteration 20350/ 115203 | consumed samples: 5209600 | consumed tokens: 10669260800 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.374594E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.947 | TFLOPs: 31.27 | 7: iteration 20360/ 115203 | consumed samples: 5212160 | consumed tokens: 10674503680 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.423139E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.736 | TFLOPs: 31.20 | 7: iteration 20370/ 115203 | consumed samples: 5214720 | consumed tokens: 10679746560 | elapsed time per iteration (s): 0.43 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.395494E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.049 | TFLOPs: 31.06 | 7: iteration 20380/ 115203 | consumed samples: 5217280 | consumed tokens: 10684989440 | elapsed time per iteration (s): 0.44 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.422115E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.177 | TFLOPs: 30.60 | 7: iteration 20390/ 115203 | consumed samples: 5219840 | consumed tokens: 10690232320 | elapsed time per iteration (s): 0.45 | learning rate: 1.877E-04 | global batch size: 256 | lm loss: 2.393558E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.619 | TFLOPs: 30.04 | 7: iteration 20400/ 115203 | consumed samples: 5222400 | consumed tokens: 10695475200 | elapsed time per iteration (s): 0.42 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.443528E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.777 | TFLOPs: 31.68 | 7: iteration 20410/ 115203 | consumed samples: 5224960 | consumed tokens: 10700718080 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.401238E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.381 | TFLOPs: 31.13 | 7: iteration 20420/ 115203 | consumed samples: 5227520 | consumed tokens: 10705960960 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.432448E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.236 | TFLOPs: 31.23 | 7: iteration 20430/ 115203 | consumed samples: 5230080 | consumed tokens: 10711203840 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.422496E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.738 | TFLOPs: 31.31 | 7: iteration 20440/ 115203 | consumed samples: 5232640 | consumed tokens: 10716446720 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.423143E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.333 | TFLOPs: 31.29 | 7: iteration 20450/ 115203 | consumed samples: 5235200 | consumed tokens: 10721689600 | elapsed time per iteration (s): 0.44 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.411565E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.314 | TFLOPs: 30.66 | 7: iteration 20460/ 115203 | consumed samples: 5237760 | consumed tokens: 10726932480 | elapsed time per iteration (s): 0.43 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.410693E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.766 | TFLOPs: 30.89 | 7: iteration 20470/ 115203 | consumed samples: 5240320 | consumed tokens: 10732175360 | elapsed time per iteration (s): 0.42 | learning rate: 1.876E-04 | global batch size: 256 | lm loss: 2.413560E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.735 | TFLOPs: 31.62 | 7: iteration 20480/ 115203 | consumed samples: 5242880 | consumed tokens: 10737418240 | elapsed time per iteration (s): 0.44 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.422052E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.869 | TFLOPs: 30.69 | 7: iteration 20490/ 115203 | consumed samples: 5245440 | consumed tokens: 10742661120 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.417677E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.508 | TFLOPs: 31.09 | 7: iteration 20500/ 115203 | consumed samples: 5248000 | consumed tokens: 10747904000 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.426291E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.845 | TFLOPs: 31.53 | 7: iteration 20510/ 115203 | consumed samples: 5250560 | consumed tokens: 10753146880 | elapsed time per iteration (s): 0.44 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.414893E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.396 | TFLOPs: 30.30 | 7: iteration 20520/ 115203 | consumed samples: 5253120 | consumed tokens: 10758389760 | elapsed time per iteration (s): 0.42 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.465645E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.358 | TFLOPs: 31.66 | 7: iteration 20530/ 115203 | consumed samples: 5255680 | consumed tokens: 10763632640 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.429728E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.176 | TFLOPs: 31.39 | 7: iteration 20540/ 115203 | consumed samples: 5258240 | consumed tokens: 10768875520 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.399944E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.623 | TFLOPs: 31.25 | 7: iteration 20550/ 115203 | consumed samples: 5260800 | consumed tokens: 10774118400 | elapsed time per iteration (s): 0.43 | learning rate: 1.875E-04 | global batch size: 256 | lm loss: 2.430354E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.868 | TFLOPs: 31.11 | 7: iteration 20560/ 115203 | consumed samples: 5263360 | consumed tokens: 10779361280 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.416966E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.614 | TFLOPs: 31.25 | 7: iteration 20570/ 115203 | consumed samples: 5265920 | consumed tokens: 10784604160 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.406254E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.000 | TFLOPs: 31.01 | 7: iteration 20580/ 115203 | consumed samples: 5268480 | consumed tokens: 10789847040 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.386333E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.523 | TFLOPs: 31.56 | 7: iteration 20590/ 115203 | consumed samples: 5271040 | consumed tokens: 10795089920 | elapsed time per iteration (s): 0.42 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.419653E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.641 | TFLOPs: 31.72 | 7: iteration 20600/ 115203 | consumed samples: 5273600 | consumed tokens: 10800332800 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.394275E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.634 | TFLOPs: 31.15 | 7: iteration 20610/ 115203 | consumed samples: 5276160 | consumed tokens: 10805575680 | elapsed time per iteration (s): 0.44 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.402560E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.555 | TFLOPs: 30.83 | 7: iteration 20620/ 115203 | consumed samples: 5278720 | consumed tokens: 10810818560 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.392521E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.680 | TFLOPs: 31.46 | 7: iteration 20630/ 115203 | consumed samples: 5281280 | consumed tokens: 10816061440 | elapsed time per iteration (s): 0.43 | learning rate: 1.874E-04 | global batch size: 256 | lm loss: 2.442859E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.701 | TFLOPs: 31.47 | 7: iteration 20640/ 115203 | consumed samples: 5283840 | consumed tokens: 10821304320 | elapsed time per iteration (s): 0.44 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.422127E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.169 | TFLOPs: 30.86 | 7: iteration 20650/ 115203 | consumed samples: 5286400 | consumed tokens: 10826547200 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.414642E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.313 | TFLOPs: 31.65 | 7: iteration 20660/ 115203 | consumed samples: 5288960 | consumed tokens: 10831790080 | elapsed time per iteration (s): 0.44 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.442472E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.604 | TFLOPs: 30.78 | 7: iteration 20670/ 115203 | consumed samples: 5291520 | consumed tokens: 10837032960 | elapsed time per iteration (s): 0.44 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.418851E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.924 | TFLOPs: 30.69 | 7: iteration 20680/ 115203 | consumed samples: 5294080 | consumed tokens: 10842275840 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.398654E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.287 | TFLOPs: 30.92 | 7: iteration 20690/ 115203 | consumed samples: 5296640 | consumed tokens: 10847518720 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.411296E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.314 | TFLOPs: 31.39 | 7: iteration 20700/ 115203 | consumed samples: 5299200 | consumed tokens: 10852761600 | elapsed time per iteration (s): 0.42 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.423261E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.589 | TFLOPs: 31.72 | 7: iteration 20710/ 115203 | consumed samples: 5301760 | consumed tokens: 10858004480 | elapsed time per iteration (s): 0.43 | learning rate: 1.873E-04 | global batch size: 256 | lm loss: 2.394115E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.910 | TFLOPs: 31.53 | 7: iteration 20720/ 115203 | consumed samples: 5304320 | consumed tokens: 10863247360 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.401641E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.581 | TFLOPs: 31.46 | 7: iteration 20730/ 115203 | consumed samples: 5306880 | consumed tokens: 10868490240 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.416191E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.042 | TFLOPs: 31.64 | 7: iteration 20740/ 115203 | consumed samples: 5309440 | consumed tokens: 10873733120 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.402659E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.557 | TFLOPs: 31.46 | 7: iteration 20750/ 115203 | consumed samples: 5312000 | consumed tokens: 10878976000 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.419051E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.630 | TFLOPs: 31.25 | 7: iteration 20760/ 115203 | consumed samples: 5314560 | consumed tokens: 10884218880 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.423393E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.057 | TFLOPs: 31.17 | 7: iteration 20770/ 115203 | consumed samples: 5317120 | consumed tokens: 10889461760 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.415290E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.606 | TFLOPs: 31.51 | 7: iteration 20780/ 115203 | consumed samples: 5319680 | consumed tokens: 10894704640 | elapsed time per iteration (s): 0.43 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.414160E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.096 | TFLOPs: 31.59 | 7: iteration 20790/ 115203 | consumed samples: 5322240 | consumed tokens: 10899947520 | elapsed time per iteration (s): 0.42 | learning rate: 1.872E-04 | global batch size: 256 | lm loss: 2.420573E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.415 | TFLOPs: 31.66 | 7: iteration 20800/ 115203 | consumed samples: 5324800 | consumed tokens: 10905190400 | elapsed time per iteration (s): 0.42 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.441167E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.073 | TFLOPs: 31.80 | 7: iteration 20810/ 115203 | consumed samples: 5327360 | consumed tokens: 10910433280 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.413565E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.322 | TFLOPs: 31.03 | 7: iteration 20820/ 115203 | consumed samples: 5329920 | consumed tokens: 10915676160 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.416034E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.006 | TFLOPs: 31.48 | 7: iteration 20830/ 115203 | consumed samples: 5332480 | consumed tokens: 10920919040 | elapsed time per iteration (s): 0.44 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.379998E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.964 | TFLOPs: 30.85 | 7: iteration 20840/ 115203 | consumed samples: 5335040 | consumed tokens: 10926161920 | elapsed time per iteration (s): 0.44 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.406605E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.424 | TFLOPs: 30.56 | 7: iteration 20850/ 115203 | consumed samples: 5337600 | consumed tokens: 10931404800 | elapsed time per iteration (s): 0.44 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.418159E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.464 | TFLOPs: 30.82 | 7: iteration 20860/ 115203 | consumed samples: 5340160 | consumed tokens: 10936647680 | elapsed time per iteration (s): 0.43 | learning rate: 1.871E-04 | global batch size: 256 | lm loss: 2.422219E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.325 | TFLOPs: 31.03 | 7: iteration 20870/ 115203 | consumed samples: 5342720 | consumed tokens: 10941890560 | elapsed time per iteration (s): 0.44 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.430852E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.143 | TFLOPs: 30.75 | 7: iteration 20880/ 115203 | consumed samples: 5345280 | consumed tokens: 10947133440 | elapsed time per iteration (s): 0.42 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.420392E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.408 | TFLOPs: 31.71 | 7: iteration 20890/ 115203 | consumed samples: 5347840 | consumed tokens: 10952376320 | elapsed time per iteration (s): 0.44 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.402544E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.216 | TFLOPs: 30.86 | 7: iteration 20900/ 115203 | consumed samples: 5350400 | consumed tokens: 10957619200 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.404243E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.650 | TFLOPs: 31.15 | 7: iteration 20910/ 115203 | consumed samples: 5352960 | consumed tokens: 10962862080 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.400835E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.647 | TFLOPs: 31.51 | 7: iteration 20920/ 115203 | consumed samples: 5355520 | consumed tokens: 10968104960 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.408100E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.038 | TFLOPs: 31.59 | 7: iteration 20930/ 115203 | consumed samples: 5358080 | consumed tokens: 10973347840 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.412964E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.002 | TFLOPs: 31.27 | 7: iteration 20940/ 115203 | consumed samples: 5360640 | consumed tokens: 10978590720 | elapsed time per iteration (s): 0.43 | learning rate: 1.870E-04 | global batch size: 256 | lm loss: 2.384459E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.846 | TFLOPs: 31.47 | 7: iteration 20950/ 115203 | consumed samples: 5363200 | consumed tokens: 10983833600 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.401771E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.335 | TFLOPs: 31.24 | 7: iteration 20960/ 115203 | consumed samples: 5365760 | consumed tokens: 10989076480 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.406781E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.271 | TFLOPs: 30.92 | 7: iteration 20970/ 115203 | consumed samples: 5368320 | consumed tokens: 10994319360 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.399600E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.887 | TFLOPs: 31.27 | 7: iteration 20980/ 115203 | consumed samples: 5370880 | consumed tokens: 10999562240 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.386451E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.777 | TFLOPs: 30.89 | 7: iteration 20990/ 115203 | consumed samples: 5373440 | consumed tokens: 11004805120 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.415395E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.936 | TFLOPs: 30.90 | 7: iteration 21000/ 115203 | consumed samples: 5376000 | consumed tokens: 11010048000 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.420218E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.094 | TFLOPs: 31.22 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 21000 | lm loss value: 2.361594E+00 | lm loss PPL: 1.060784E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 21000 to checkpoints_221m 0: [2022-11-28 15:28:23,979] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step21000 is begin to save! 0: [2022-11-28 15:28:23,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:28:24,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:28:24,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:28:24,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:28:24,109] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:28:24,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:28:24,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:28:24,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:28:24,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:28:24,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:28:24,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:28:24,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:28:24,202] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:28:24,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:28:24,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:28:24,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:28:24,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:28:24,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:28:24,275] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:28:24,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:28:24,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:28:24,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:28:24,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:28:24,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:28:24,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:28:24,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:28:24,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:28:24,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:28:24,395] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:28:24,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:28:24,420] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:28:24,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:28:24,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:28:24,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:28:24,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:28:24,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:28:24,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:28:24,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:28:24,513] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:28:24,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:28:24,518] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step21000/mp_rank_00_model_states.pt 0: [2022-11-28 15:28:24,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:28:24,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:28:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:28:24,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:28:24,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:28:24,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2022-11-28 15:28:24,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2022-11-28 15:28:24,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 7: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:28:24,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:28:24,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 15:28:24,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 2: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 4: [2022-11-28 15:28:24,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:28:24,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:28:24,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:28:24,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 3: [2022-11-28 15:28:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:28:24,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:28:24,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:28:24,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 1: [2022-11-28 15:28:24,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: [2022-11-28 15:28:24,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step21000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:28:24,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step21000 is ready now! 0: successfully saved checkpoint at iteration 21000 to checkpoints_221m 7: time (ms) | save-checkpoint: 675.63 7: iteration 21010/ 115203 | consumed samples: 5378560 | consumed tokens: 11015290880 | elapsed time per iteration (s): 0.52 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.434753E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 496.844 | TFLOPs: 26.07 | 7: iteration 21020/ 115203 | consumed samples: 5381120 | consumed tokens: 11020533760 | elapsed time per iteration (s): 0.43 | learning rate: 1.869E-04 | global batch size: 256 | lm loss: 2.424788E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.841 | TFLOPs: 31.42 | 7: iteration 21030/ 115203 | consumed samples: 5383680 | consumed tokens: 11025776640 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.440946E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.109 | TFLOPs: 31.70 | 7: iteration 21040/ 115203 | consumed samples: 5386240 | consumed tokens: 11031019520 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.397487E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.624 | TFLOPs: 30.94 | 7: iteration 21050/ 115203 | consumed samples: 5388800 | consumed tokens: 11036262400 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.414803E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.422 | TFLOPs: 30.98 | 7: iteration 21060/ 115203 | consumed samples: 5391360 | consumed tokens: 11041505280 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.441923E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.155 | TFLOPs: 31.49 | 7: iteration 21070/ 115203 | consumed samples: 5393920 | consumed tokens: 11046748160 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.394700E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.379 | TFLOPs: 31.71 | 7: iteration 21080/ 115203 | consumed samples: 5396480 | consumed tokens: 11051991040 | elapsed time per iteration (s): 0.43 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.389013E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.345 | TFLOPs: 31.45 | 7: iteration 21090/ 115203 | consumed samples: 5399040 | consumed tokens: 11057233920 | elapsed time per iteration (s): 0.44 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.394181E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.199 | TFLOPs: 30.81 | 7: iteration 21100/ 115203 | consumed samples: 5401600 | consumed tokens: 11062476800 | elapsed time per iteration (s): 0.42 | learning rate: 1.868E-04 | global batch size: 256 | lm loss: 2.398268E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.213 | TFLOPs: 31.75 | 7: iteration 21110/ 115203 | consumed samples: 5404160 | consumed tokens: 11067719680 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.434651E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.963 | TFLOPs: 31.37 | 7: iteration 21120/ 115203 | consumed samples: 5406720 | consumed tokens: 11072962560 | elapsed time per iteration (s): 0.44 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.433343E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.932 | TFLOPs: 30.69 | 7: iteration 21130/ 115203 | consumed samples: 5409280 | consumed tokens: 11078205440 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.394704E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.881 | TFLOPs: 31.37 | 7: iteration 21140/ 115203 | consumed samples: 5411840 | consumed tokens: 11083448320 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.411692E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.178 | TFLOPs: 31.23 | 7: iteration 21150/ 115203 | consumed samples: 5414400 | consumed tokens: 11088691200 | elapsed time per iteration (s): 0.42 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.431161E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.544 | TFLOPs: 31.67 | 7: iteration 21160/ 115203 | consumed samples: 5416960 | consumed tokens: 11093934080 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.416761E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.909 | TFLOPs: 31.48 | 7: iteration 21170/ 115203 | consumed samples: 5419520 | consumed tokens: 11099176960 | elapsed time per iteration (s): 0.43 | learning rate: 1.867E-04 | global batch size: 256 | lm loss: 2.396227E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.810 | TFLOPs: 31.42 | 7: iteration 21180/ 115203 | consumed samples: 5422080 | consumed tokens: 11104419840 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.394710E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.839 | TFLOPs: 31.53 | 7: iteration 21190/ 115203 | consumed samples: 5424640 | consumed tokens: 11109662720 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.413612E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.910 | TFLOPs: 31.53 | 7: iteration 21200/ 115203 | consumed samples: 5427200 | consumed tokens: 11114905600 | elapsed time per iteration (s): 0.44 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.400995E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.509 | TFLOPs: 30.67 | 7: iteration 21210/ 115203 | consumed samples: 5429760 | consumed tokens: 11120148480 | elapsed time per iteration (s): 0.44 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.388941E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.010 | TFLOPs: 30.75 | 7: iteration 21220/ 115203 | consumed samples: 5432320 | consumed tokens: 11125391360 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.407932E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.006 | TFLOPs: 31.48 | 7: iteration 21230/ 115203 | consumed samples: 5434880 | consumed tokens: 11130634240 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.388873E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.032 | TFLOPs: 31.95 | 7: iteration 21240/ 115203 | consumed samples: 5437440 | consumed tokens: 11135877120 | elapsed time per iteration (s): 0.43 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.375218E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.412 | TFLOPs: 31.45 | 7: iteration 21250/ 115203 | consumed samples: 5440000 | consumed tokens: 11141120000 | elapsed time per iteration (s): 0.42 | learning rate: 1.866E-04 | global batch size: 256 | lm loss: 2.443618E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.967 | TFLOPs: 31.85 | 7: iteration 21260/ 115203 | consumed samples: 5442560 | consumed tokens: 11146362880 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.417603E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.640 | TFLOPs: 31.15 | 7: iteration 21270/ 115203 | consumed samples: 5445120 | consumed tokens: 11151605760 | elapsed time per iteration (s): 0.44 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.392033E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.245 | TFLOPs: 30.81 | 7: iteration 21280/ 115203 | consumed samples: 5447680 | consumed tokens: 11156848640 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.407973E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.514 | TFLOPs: 31.40 | 7: iteration 21290/ 115203 | consumed samples: 5450240 | consumed tokens: 11162091520 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.455510E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.771 | TFLOPs: 31.21 | 7: iteration 21300/ 115203 | consumed samples: 5452800 | consumed tokens: 11167334400 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.405260E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.669 | TFLOPs: 31.20 | 7: iteration 21310/ 115203 | consumed samples: 5455360 | consumed tokens: 11172577280 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.397319E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.731 | TFLOPs: 31.57 | 7: iteration 21320/ 115203 | consumed samples: 5457920 | consumed tokens: 11177820160 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.416922E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.275 | TFLOPs: 31.60 | 7: iteration 21330/ 115203 | consumed samples: 5460480 | consumed tokens: 11183063040 | elapsed time per iteration (s): 0.43 | learning rate: 1.865E-04 | global batch size: 256 | lm loss: 2.393268E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.783 | TFLOPs: 31.36 | 7: iteration 21340/ 115203 | consumed samples: 5463040 | consumed tokens: 11188305920 | elapsed time per iteration (s): 0.44 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.405599E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.356 | TFLOPs: 30.77 | 7: iteration 21350/ 115203 | consumed samples: 5465600 | consumed tokens: 11193548800 | elapsed time per iteration (s): 0.44 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.431321E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.856 | TFLOPs: 30.79 | 7: iteration 21360/ 115203 | consumed samples: 5468160 | consumed tokens: 11198791680 | elapsed time per iteration (s): 0.43 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.430608E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.124 | TFLOPs: 31.49 | 7: iteration 21370/ 115203 | consumed samples: 5470720 | consumed tokens: 11204034560 | elapsed time per iteration (s): 0.43 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.433689E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.368 | TFLOPs: 31.03 | 7: iteration 21380/ 115203 | consumed samples: 5473280 | consumed tokens: 11209277440 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.377425E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.600 | TFLOPs: 31.62 | 7: iteration 21390/ 115203 | consumed samples: 5475840 | consumed tokens: 11214520320 | elapsed time per iteration (s): 0.42 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.414858E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.017 | TFLOPs: 31.64 | 7: iteration 21400/ 115203 | consumed samples: 5478400 | consumed tokens: 11219763200 | elapsed time per iteration (s): 0.44 | learning rate: 1.864E-04 | global batch size: 256 | lm loss: 2.410163E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.611 | TFLOPs: 30.83 | 7: iteration 21410/ 115203 | consumed samples: 5480960 | consumed tokens: 11225006080 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.434356E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.713 | TFLOPs: 31.15 | 7: iteration 21420/ 115203 | consumed samples: 5483520 | consumed tokens: 11230248960 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.413515E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.453 | TFLOPs: 31.24 | 7: iteration 21430/ 115203 | consumed samples: 5486080 | consumed tokens: 11235491840 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.442922E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.209 | TFLOPs: 31.60 | 7: iteration 21440/ 115203 | consumed samples: 5488640 | consumed tokens: 11240734720 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.427777E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.295 | TFLOPs: 30.92 | 7: iteration 21450/ 115203 | consumed samples: 5491200 | consumed tokens: 11245977600 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.419394E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.141 | TFLOPs: 30.96 | 7: iteration 21460/ 115203 | consumed samples: 5493760 | consumed tokens: 11251220480 | elapsed time per iteration (s): 0.42 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.396741E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.628 | TFLOPs: 31.93 | 7: iteration 21470/ 115203 | consumed samples: 5496320 | consumed tokens: 11256463360 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.398115E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.166 | TFLOPs: 31.23 | 7: iteration 21480/ 115203 | consumed samples: 5498880 | consumed tokens: 11261706240 | elapsed time per iteration (s): 0.43 | learning rate: 1.863E-04 | global batch size: 256 | lm loss: 2.405620E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.470 | TFLOPs: 31.35 | 7: iteration 21490/ 115203 | consumed samples: 5501440 | consumed tokens: 11266949120 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.436902E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.652 | TFLOPs: 30.94 | 7: iteration 21500/ 115203 | consumed samples: 5504000 | consumed tokens: 11272192000 | elapsed time per iteration (s): 0.42 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.413407E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.710 | TFLOPs: 31.89 | 7: iteration 21510/ 115203 | consumed samples: 5506560 | consumed tokens: 11277434880 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.376451E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.364 | TFLOPs: 31.40 | 7: iteration 21520/ 115203 | consumed samples: 5509120 | consumed tokens: 11282677760 | elapsed time per iteration (s): 0.44 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.421413E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.623 | TFLOPs: 30.25 | 7: iteration 21530/ 115203 | consumed samples: 5511680 | consumed tokens: 11287920640 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.382796E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.684 | TFLOPs: 31.57 | 7: iteration 21540/ 115203 | consumed samples: 5514240 | consumed tokens: 11293163520 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.367461E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.608 | TFLOPs: 31.51 | 7: iteration 21550/ 115203 | consumed samples: 5516800 | consumed tokens: 11298406400 | elapsed time per iteration (s): 0.43 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.423223E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.692 | TFLOPs: 31.10 | 7: iteration 21560/ 115203 | consumed samples: 5519360 | consumed tokens: 11303649280 | elapsed time per iteration (s): 0.42 | learning rate: 1.862E-04 | global batch size: 256 | lm loss: 2.397759E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.318 | TFLOPs: 31.71 | 7: iteration 21570/ 115203 | consumed samples: 5521920 | consumed tokens: 11308892160 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.398243E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.532 | TFLOPs: 31.67 | 7: iteration 21580/ 115203 | consumed samples: 5524480 | consumed tokens: 11314135040 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.409769E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.536 | TFLOPs: 31.35 | 7: iteration 21590/ 115203 | consumed samples: 5527040 | consumed tokens: 11319377920 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.372195E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.779 | TFLOPs: 31.21 | 7: iteration 21600/ 115203 | consumed samples: 5529600 | consumed tokens: 11324620800 | elapsed time per iteration (s): 0.42 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.420882E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.157 | TFLOPs: 31.75 | 7: iteration 21610/ 115203 | consumed samples: 5532160 | consumed tokens: 11329863680 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.432364E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.206 | TFLOPs: 31.60 | 7: iteration 21620/ 115203 | consumed samples: 5534720 | consumed tokens: 11335106560 | elapsed time per iteration (s): 0.43 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.426828E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.091 | TFLOPs: 31.54 | 7: iteration 21630/ 115203 | consumed samples: 5537280 | consumed tokens: 11340349440 | elapsed time per iteration (s): 0.44 | learning rate: 1.861E-04 | global batch size: 256 | lm loss: 2.437803E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.656 | TFLOPs: 30.73 | 7: iteration 21640/ 115203 | consumed samples: 5539840 | consumed tokens: 11345592320 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.423986E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.701 | TFLOPs: 31.36 | 7: iteration 21650/ 115203 | consumed samples: 5542400 | consumed tokens: 11350835200 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.399726E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.282 | TFLOPs: 31.34 | 7: iteration 21660/ 115203 | consumed samples: 5544960 | consumed tokens: 11356078080 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.414343E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.139 | TFLOPs: 31.17 | 7: iteration 21670/ 115203 | consumed samples: 5547520 | consumed tokens: 11361320960 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.371800E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.776 | TFLOPs: 31.26 | 7: iteration 21680/ 115203 | consumed samples: 5550080 | consumed tokens: 11366563840 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.417267E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.867 | TFLOPs: 31.53 | 7: iteration 21690/ 115203 | consumed samples: 5552640 | consumed tokens: 11371806720 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.385656E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.852 | TFLOPs: 31.32 | 7: iteration 21700/ 115203 | consumed samples: 5555200 | consumed tokens: 11377049600 | elapsed time per iteration (s): 0.43 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.376696E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.955 | TFLOPs: 31.48 | 7: iteration 21710/ 115203 | consumed samples: 5557760 | consumed tokens: 11382292480 | elapsed time per iteration (s): 0.42 | learning rate: 1.860E-04 | global batch size: 256 | lm loss: 2.394526E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.069 | TFLOPs: 31.75 | 7: iteration 21720/ 115203 | consumed samples: 5560320 | consumed tokens: 11387535360 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.403617E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.329 | TFLOPs: 31.24 | 7: iteration 21730/ 115203 | consumed samples: 5562880 | consumed tokens: 11392778240 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.396685E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.944 | TFLOPs: 31.01 | 7: iteration 21740/ 115203 | consumed samples: 5565440 | consumed tokens: 11398021120 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 3.190525E+00 | grad norm: 1.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.441 | TFLOPs: 31.35 | 7: iteration 21750/ 115203 | consumed samples: 5568000 | consumed tokens: 11403264000 | elapsed time per iteration (s): 0.43 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.572728E+00 | grad norm: 0.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.387 | TFLOPs: 31.50 | 7: iteration 21760/ 115203 | consumed samples: 5570560 | consumed tokens: 11408506880 | elapsed time per iteration (s): 0.44 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.493118E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.654 | TFLOPs: 30.78 | 7: iteration 21770/ 115203 | consumed samples: 5573120 | consumed tokens: 11413749760 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.444889E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.282 | TFLOPs: 32.07 | 7: iteration 21780/ 115203 | consumed samples: 5575680 | consumed tokens: 11418992640 | elapsed time per iteration (s): 0.42 | learning rate: 1.859E-04 | global batch size: 256 | lm loss: 2.420421E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.480 | TFLOPs: 32.19 | 7: iteration 21790/ 115203 | consumed samples: 5578240 | consumed tokens: 11424235520 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.432981E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.489 | TFLOPs: 31.30 | 7: iteration 21800/ 115203 | consumed samples: 5580800 | consumed tokens: 11429478400 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.427701E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.506 | TFLOPs: 30.98 | 7: iteration 21810/ 115203 | consumed samples: 5583360 | consumed tokens: 11434721280 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.412148E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.661 | TFLOPs: 31.88 | 7: iteration 21820/ 115203 | consumed samples: 5585920 | consumed tokens: 11439964160 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.420197E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.037 | TFLOPs: 31.54 | 7: iteration 21830/ 115203 | consumed samples: 5588480 | consumed tokens: 11445207040 | elapsed time per iteration (s): 0.42 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.428545E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.023 | TFLOPs: 31.74 | 7: iteration 21840/ 115203 | consumed samples: 5591040 | consumed tokens: 11450449920 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.424266E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.966 | TFLOPs: 31.48 | 7: iteration 21850/ 115203 | consumed samples: 5593600 | consumed tokens: 11455692800 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.440282E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.700 | TFLOPs: 31.52 | 7: iteration 21860/ 115203 | consumed samples: 5596160 | consumed tokens: 11460935680 | elapsed time per iteration (s): 0.43 | learning rate: 1.858E-04 | global batch size: 256 | lm loss: 2.441488E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.265 | TFLOPs: 31.34 | 7: iteration 21870/ 115203 | consumed samples: 5598720 | consumed tokens: 11466178560 | elapsed time per iteration (s): 0.44 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.434545E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.448 | TFLOPs: 30.35 | 7: iteration 21880/ 115203 | consumed samples: 5601280 | consumed tokens: 11471421440 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.396509E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.472 | TFLOPs: 31.45 | 7: iteration 21890/ 115203 | consumed samples: 5603840 | consumed tokens: 11476664320 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.389494E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.537 | TFLOPs: 31.56 | 7: iteration 21900/ 115203 | consumed samples: 5606400 | consumed tokens: 11481907200 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.431142E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.535 | TFLOPs: 31.19 | 7: iteration 21910/ 115203 | consumed samples: 5608960 | consumed tokens: 11487150080 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.421772E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.910 | TFLOPs: 31.48 | 7: iteration 21920/ 115203 | consumed samples: 5611520 | consumed tokens: 11492392960 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.391740E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.194 | TFLOPs: 31.54 | 7: iteration 21930/ 115203 | consumed samples: 5614080 | consumed tokens: 11497635840 | elapsed time per iteration (s): 0.43 | learning rate: 1.857E-04 | global batch size: 256 | lm loss: 2.394573E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.589 | TFLOPs: 31.35 | 7: iteration 21940/ 115203 | consumed samples: 5616640 | consumed tokens: 11502878720 | elapsed time per iteration (s): 0.43 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.438599E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.171 | TFLOPs: 31.59 | 7: iteration 21950/ 115203 | consumed samples: 5619200 | consumed tokens: 11508121600 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.436153E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.703 | TFLOPs: 31.68 | 7: iteration 21960/ 115203 | consumed samples: 5621760 | consumed tokens: 11513364480 | elapsed time per iteration (s): 0.43 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.425810E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.858 | TFLOPs: 31.32 | 7: iteration 21970/ 115203 | consumed samples: 5624320 | consumed tokens: 11518607360 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.421577E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.191 | TFLOPs: 31.96 | 7: iteration 21980/ 115203 | consumed samples: 5626880 | consumed tokens: 11523850240 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.399763E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.096 | TFLOPs: 31.85 | 7: iteration 21990/ 115203 | consumed samples: 5629440 | consumed tokens: 11529093120 | elapsed time per iteration (s): 0.42 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.444210E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.707 | TFLOPs: 31.78 | 0: [2022-11-28 15:35:32,980] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=0, lr=[0.00018556333335793902, 0.00018556333335793902, 0.00018556333335793902], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 22000/ 115203 | consumed samples: 5632000 | consumed tokens: 11534336000 | elapsed time per iteration (s): 0.43 | learning rate: 1.856E-04 | global batch size: 256 | lm loss: 2.396891E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.948 | TFLOPs: 31.32 | 0: steps: 22000 loss: 2.3815 iter time (s): 0.428 samples/sec: 598.298 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 22000 | lm loss value: 2.333496E+00 | lm loss PPL: 1.031394E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 22000 to checkpoints_221m 0: [2022-11-28 15:35:33,139] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step22000 is begin to save! 0: [2022-11-28 15:35:33,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:35:33,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:35:33,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:35:33,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:35:33,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:35:33,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:35:33,290] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:35:33,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:35:33,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:35:33,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:35:33,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:35:33,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:35:33,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:35:33,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:35:33,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:35:33,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:35:33,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:35:33,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:35:33,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:35:33,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:35:33,453] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:35:33,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:35:33,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:35:33,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:35:33,499] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:35:33,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:35:33,523] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:35:33,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:35:33,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:35:33,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:35:33,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:35:33,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:35:33,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:35:33,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:35:33,616] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:35:33,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:35:33,640] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:35:33,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:35:33,663] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:35:33,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:35:33,667] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step22000/mp_rank_00_model_states.pt 0: [2022-11-28 15:35:33,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:35:33,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:35:33,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:35:33,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,737] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,737] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 3: [2022-11-28 15:35:33,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:35:33,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 15:35:33,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 2: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:35:33,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 7: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:35:33,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:35:33,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 4: [2022-11-28 15:35:33,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,751] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,751] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,751] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 1: [2022-11-28 15:35:33,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:35:33,752] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:35:33,752] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:35:33,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 5: [2022-11-28 15:35:33,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:35:33,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 6: [2022-11-28 15:35:33,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:35:33,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:35:33,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: [2022-11-28 15:35:33,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step22000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:35:33,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step22000 is ready now! 0: successfully saved checkpoint at iteration 22000 to checkpoints_221m 7: time (ms) | save-checkpoint: 661.02 7: iteration 22010/ 115203 | consumed samples: 5634560 | consumed tokens: 11539578880 | elapsed time per iteration (s): 0.51 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.409704E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 505.012 | TFLOPs: 26.50 | 7: iteration 22020/ 115203 | consumed samples: 5637120 | consumed tokens: 11544821760 | elapsed time per iteration (s): 0.43 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.408557E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.390 | TFLOPs: 31.34 | 7: iteration 22030/ 115203 | consumed samples: 5639680 | consumed tokens: 11550064640 | elapsed time per iteration (s): 0.44 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.415156E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.585 | TFLOPs: 30.46 | 7: iteration 22040/ 115203 | consumed samples: 5642240 | consumed tokens: 11555307520 | elapsed time per iteration (s): 0.45 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.372324E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.934 | TFLOPs: 30.17 | 7: iteration 22050/ 115203 | consumed samples: 5644800 | consumed tokens: 11560550400 | elapsed time per iteration (s): 0.43 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.396424E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.632 | TFLOPs: 30.94 | 7: iteration 22060/ 115203 | consumed samples: 5647360 | consumed tokens: 11565793280 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.405998E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.115 | TFLOPs: 32.06 | 7: iteration 22070/ 115203 | consumed samples: 5649920 | consumed tokens: 11571036160 | elapsed time per iteration (s): 0.43 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.386170E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.366 | TFLOPs: 31.55 | 7: iteration 22080/ 115203 | consumed samples: 5652480 | consumed tokens: 11576279040 | elapsed time per iteration (s): 0.42 | learning rate: 1.855E-04 | global batch size: 256 | lm loss: 2.418198E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.582 | TFLOPs: 31.77 | 7: iteration 22090/ 115203 | consumed samples: 5655040 | consumed tokens: 11581521920 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.421371E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.922 | TFLOPs: 31.74 | 7: iteration 22100/ 115203 | consumed samples: 5657600 | consumed tokens: 11586764800 | elapsed time per iteration (s): 0.43 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.416288E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.609 | TFLOPs: 31.30 | 7: iteration 22110/ 115203 | consumed samples: 5660160 | consumed tokens: 11592007680 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.420407E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.617 | TFLOPs: 31.99 | 7: iteration 22120/ 115203 | consumed samples: 5662720 | consumed tokens: 11597250560 | elapsed time per iteration (s): 0.43 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.412039E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.789 | TFLOPs: 31.16 | 7: iteration 22130/ 115203 | consumed samples: 5665280 | consumed tokens: 11602493440 | elapsed time per iteration (s): 0.43 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.424799E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.327 | TFLOPs: 31.18 | 7: iteration 22140/ 115203 | consumed samples: 5667840 | consumed tokens: 11607736320 | elapsed time per iteration (s): 0.43 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.371165E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.445 | TFLOPs: 31.45 | 7: iteration 22150/ 115203 | consumed samples: 5670400 | consumed tokens: 11612979200 | elapsed time per iteration (s): 0.42 | learning rate: 1.854E-04 | global batch size: 256 | lm loss: 2.397316E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.500 | TFLOPs: 31.72 | 7: iteration 22160/ 115203 | consumed samples: 5672960 | consumed tokens: 11618222080 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.430902E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.730 | TFLOPs: 31.73 | 7: iteration 22170/ 115203 | consumed samples: 5675520 | consumed tokens: 11623464960 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.377630E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.012 | TFLOPs: 31.64 | 7: iteration 22180/ 115203 | consumed samples: 5678080 | consumed tokens: 11628707840 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.417755E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.017 | TFLOPs: 31.95 | 7: iteration 22190/ 115203 | consumed samples: 5680640 | consumed tokens: 11633950720 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.427978E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.006 | TFLOPs: 31.85 | 7: iteration 22200/ 115203 | consumed samples: 5683200 | consumed tokens: 11639193600 | elapsed time per iteration (s): 0.43 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.402355E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.915 | TFLOPs: 30.95 | 7: iteration 22210/ 115203 | consumed samples: 5685760 | consumed tokens: 11644436480 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.420192E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.089 | TFLOPs: 32.17 | 7: iteration 22220/ 115203 | consumed samples: 5688320 | consumed tokens: 11649679360 | elapsed time per iteration (s): 0.43 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.400865E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.754 | TFLOPs: 30.89 | 7: iteration 22230/ 115203 | consumed samples: 5690880 | consumed tokens: 11654922240 | elapsed time per iteration (s): 0.42 | learning rate: 1.853E-04 | global batch size: 256 | lm loss: 2.424829E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.402 | TFLOPs: 31.61 | 7: iteration 22240/ 115203 | consumed samples: 5693440 | consumed tokens: 11660165120 | elapsed time per iteration (s): 0.43 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.425853E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.719 | TFLOPs: 31.41 | 7: iteration 22250/ 115203 | consumed samples: 5696000 | consumed tokens: 11665408000 | elapsed time per iteration (s): 0.43 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.365525E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.765 | TFLOPs: 31.47 | 7: iteration 22260/ 115203 | consumed samples: 5698560 | consumed tokens: 11670650880 | elapsed time per iteration (s): 0.43 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.409355E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.108 | TFLOPs: 31.38 | 7: iteration 22270/ 115203 | consumed samples: 5701120 | consumed tokens: 11675893760 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.427531E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.594 | TFLOPs: 31.77 | 7: iteration 22280/ 115203 | consumed samples: 5703680 | consumed tokens: 11681136640 | elapsed time per iteration (s): 0.43 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.424534E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.809 | TFLOPs: 31.37 | 7: iteration 22290/ 115203 | consumed samples: 5706240 | consumed tokens: 11686379520 | elapsed time per iteration (s): 0.44 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.451838E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.828 | TFLOPs: 30.58 | 7: iteration 22300/ 115203 | consumed samples: 5708800 | consumed tokens: 11691622400 | elapsed time per iteration (s): 0.42 | learning rate: 1.852E-04 | global batch size: 256 | lm loss: 2.389390E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.252 | TFLOPs: 31.97 | 7: iteration 22310/ 115203 | consumed samples: 5711360 | consumed tokens: 11696865280 | elapsed time per iteration (s): 0.43 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.402547E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.065 | TFLOPs: 31.38 | 7: iteration 22320/ 115203 | consumed samples: 5713920 | consumed tokens: 11702108160 | elapsed time per iteration (s): 0.43 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.383431E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.605 | TFLOPs: 31.57 | 7: iteration 22330/ 115203 | consumed samples: 5716480 | consumed tokens: 11707351040 | elapsed time per iteration (s): 0.44 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.411988E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.521 | TFLOPs: 30.30 | 7: iteration 22340/ 115203 | consumed samples: 5719040 | consumed tokens: 11712593920 | elapsed time per iteration (s): 0.43 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.390317E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.738 | TFLOPs: 30.89 | 7: iteration 22350/ 115203 | consumed samples: 5721600 | consumed tokens: 11717836800 | elapsed time per iteration (s): 0.43 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.362671E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.785 | TFLOPs: 31.52 | 7: iteration 22360/ 115203 | consumed samples: 5724160 | consumed tokens: 11723079680 | elapsed time per iteration (s): 0.42 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.408308E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.975 | TFLOPs: 31.69 | 7: iteration 22370/ 115203 | consumed samples: 5726720 | consumed tokens: 11728322560 | elapsed time per iteration (s): 0.43 | learning rate: 1.851E-04 | global batch size: 256 | lm loss: 2.399583E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.693 | TFLOPs: 31.41 | 7: iteration 22380/ 115203 | consumed samples: 5729280 | consumed tokens: 11733565440 | elapsed time per iteration (s): 0.44 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.391890E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.203 | TFLOPs: 30.44 | 7: iteration 22390/ 115203 | consumed samples: 5731840 | consumed tokens: 11738808320 | elapsed time per iteration (s): 0.44 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.352640E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.232 | TFLOPs: 30.81 | 7: iteration 22400/ 115203 | consumed samples: 5734400 | consumed tokens: 11744051200 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.396374E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.884 | TFLOPs: 31.37 | 7: iteration 22410/ 115203 | consumed samples: 5736960 | consumed tokens: 11749294080 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.396764E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.717 | TFLOPs: 31.52 | 7: iteration 22420/ 115203 | consumed samples: 5739520 | consumed tokens: 11754536960 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.373204E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.318 | TFLOPs: 31.60 | 7: iteration 22430/ 115203 | consumed samples: 5742080 | consumed tokens: 11759779840 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.370276E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.864 | TFLOPs: 31.58 | 7: iteration 22440/ 115203 | consumed samples: 5744640 | consumed tokens: 11765022720 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.402678E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.276 | TFLOPs: 31.44 | 7: iteration 22450/ 115203 | consumed samples: 5747200 | consumed tokens: 11770265600 | elapsed time per iteration (s): 0.43 | learning rate: 1.850E-04 | global batch size: 256 | lm loss: 2.418237E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.285 | TFLOPs: 31.44 | 7: iteration 22460/ 115203 | consumed samples: 5749760 | consumed tokens: 11775508480 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.417984E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.585 | TFLOPs: 31.88 | 7: iteration 22470/ 115203 | consumed samples: 5752320 | consumed tokens: 11780751360 | elapsed time per iteration (s): 0.43 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.418403E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.281 | TFLOPs: 31.55 | 7: iteration 22480/ 115203 | consumed samples: 5754880 | consumed tokens: 11785994240 | elapsed time per iteration (s): 0.43 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.390097E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.952 | TFLOPs: 31.22 | 7: iteration 22490/ 115203 | consumed samples: 5757440 | consumed tokens: 11791237120 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.380457E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.717 | TFLOPs: 31.68 | 7: iteration 22500/ 115203 | consumed samples: 5760000 | consumed tokens: 11796480000 | elapsed time per iteration (s): 0.42 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.404166E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.282 | TFLOPs: 31.97 | 7: iteration 22510/ 115203 | consumed samples: 5762560 | consumed tokens: 11801722880 | elapsed time per iteration (s): 0.43 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.416155E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.028 | TFLOPs: 31.38 | 7: iteration 22520/ 115203 | consumed samples: 5765120 | consumed tokens: 11806965760 | elapsed time per iteration (s): 0.44 | learning rate: 1.849E-04 | global batch size: 256 | lm loss: 2.413864E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.441 | TFLOPs: 30.72 | 7: iteration 22530/ 115203 | consumed samples: 5767680 | consumed tokens: 11812208640 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.419924E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.426 | TFLOPs: 31.66 | 7: iteration 22540/ 115203 | consumed samples: 5770240 | consumed tokens: 11817451520 | elapsed time per iteration (s): 0.43 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.402007E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.172 | TFLOPs: 31.33 | 7: iteration 22550/ 115203 | consumed samples: 5772800 | consumed tokens: 11822694400 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.438420E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.783 | TFLOPs: 31.78 | 7: iteration 22560/ 115203 | consumed samples: 5775360 | consumed tokens: 11827937280 | elapsed time per iteration (s): 0.43 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.392501E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.461 | TFLOPs: 30.93 | 7: iteration 22570/ 115203 | consumed samples: 5777920 | consumed tokens: 11833180160 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.394414E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.895 | TFLOPs: 32.16 | 7: iteration 22580/ 115203 | consumed samples: 5780480 | consumed tokens: 11838423040 | elapsed time per iteration (s): 0.43 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.378671E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.111 | TFLOPs: 31.54 | 7: iteration 22590/ 115203 | consumed samples: 5783040 | consumed tokens: 11843665920 | elapsed time per iteration (s): 0.42 | learning rate: 1.848E-04 | global batch size: 256 | lm loss: 2.399670E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.567 | TFLOPs: 31.67 | 7: iteration 22600/ 115203 | consumed samples: 5785600 | consumed tokens: 11848908800 | elapsed time per iteration (s): 0.43 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.406271E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.944 | TFLOPs: 31.48 | 7: iteration 22610/ 115203 | consumed samples: 5788160 | consumed tokens: 11854151680 | elapsed time per iteration (s): 0.44 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.385313E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.354 | TFLOPs: 30.29 | 7: iteration 22620/ 115203 | consumed samples: 5790720 | consumed tokens: 11859394560 | elapsed time per iteration (s): 0.43 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.394575E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.755 | TFLOPs: 31.42 | 7: iteration 22630/ 115203 | consumed samples: 5793280 | consumed tokens: 11864637440 | elapsed time per iteration (s): 0.43 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.425668E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.441 | TFLOPs: 31.29 | 7: iteration 22640/ 115203 | consumed samples: 5795840 | consumed tokens: 11869880320 | elapsed time per iteration (s): 0.43 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.388863E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.959 | TFLOPs: 31.53 | 7: iteration 22650/ 115203 | consumed samples: 5798400 | consumed tokens: 11875123200 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.399747E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.556 | TFLOPs: 31.67 | 7: iteration 22660/ 115203 | consumed samples: 5800960 | consumed tokens: 11880366080 | elapsed time per iteration (s): 0.42 | learning rate: 1.847E-04 | global batch size: 256 | lm loss: 2.396313E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.162 | TFLOPs: 31.70 | 7: iteration 22670/ 115203 | consumed samples: 5803520 | consumed tokens: 11885608960 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.389561E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.819 | TFLOPs: 31.47 | 7: iteration 22680/ 115203 | consumed samples: 5806080 | consumed tokens: 11890851840 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.412140E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.973 | TFLOPs: 31.32 | 7: iteration 22690/ 115203 | consumed samples: 5808640 | consumed tokens: 11896094720 | elapsed time per iteration (s): 0.42 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.426900E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.966 | TFLOPs: 31.79 | 7: iteration 22700/ 115203 | consumed samples: 5811200 | consumed tokens: 11901337600 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.398915E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.440 | TFLOPs: 31.14 | 7: iteration 22710/ 115203 | consumed samples: 5813760 | consumed tokens: 11906580480 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.425903E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.526 | TFLOPs: 31.40 | 7: iteration 22720/ 115203 | consumed samples: 5816320 | consumed tokens: 11911823360 | elapsed time per iteration (s): 0.44 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.384512E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.203 | TFLOPs: 30.86 | 7: iteration 22730/ 115203 | consumed samples: 5818880 | consumed tokens: 11917066240 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.378803E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.349 | TFLOPs: 31.34 | 7: iteration 22740/ 115203 | consumed samples: 5821440 | consumed tokens: 11922309120 | elapsed time per iteration (s): 0.43 | learning rate: 1.846E-04 | global batch size: 256 | lm loss: 2.383545E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.004 | TFLOPs: 30.90 | 7: iteration 22750/ 115203 | consumed samples: 5824000 | consumed tokens: 11927552000 | elapsed time per iteration (s): 0.43 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.369659E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.561 | TFLOPs: 31.56 | 7: iteration 22760/ 115203 | consumed samples: 5826560 | consumed tokens: 11932794880 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.436311E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.956 | TFLOPs: 31.85 | 7: iteration 22770/ 115203 | consumed samples: 5829120 | consumed tokens: 11938037760 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.409802E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.113 | TFLOPs: 31.70 | 7: iteration 22780/ 115203 | consumed samples: 5831680 | consumed tokens: 11943280640 | elapsed time per iteration (s): 0.43 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.412268E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.759 | TFLOPs: 31.26 | 7: iteration 22790/ 115203 | consumed samples: 5834240 | consumed tokens: 11948523520 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.421196E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.895 | TFLOPs: 32.00 | 7: iteration 22800/ 115203 | consumed samples: 5836800 | consumed tokens: 11953766400 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.400127E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.482 | TFLOPs: 32.03 | 7: iteration 22810/ 115203 | consumed samples: 5839360 | consumed tokens: 11959009280 | elapsed time per iteration (s): 0.42 | learning rate: 1.845E-04 | global batch size: 256 | lm loss: 2.403491E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.833 | TFLOPs: 32.26 | 7: iteration 22820/ 115203 | consumed samples: 5841920 | consumed tokens: 11964252160 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.404255E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.576 | TFLOPs: 31.56 | 7: iteration 22830/ 115203 | consumed samples: 5844480 | consumed tokens: 11969495040 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.408096E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.799 | TFLOPs: 31.42 | 7: iteration 22840/ 115203 | consumed samples: 5847040 | consumed tokens: 11974737920 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.383150E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.520 | TFLOPs: 31.14 | 7: iteration 22850/ 115203 | consumed samples: 5849600 | consumed tokens: 11979980800 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.408870E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.188 | TFLOPs: 31.18 | 7: iteration 22860/ 115203 | consumed samples: 5852160 | consumed tokens: 11985223680 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.379714E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.118 | TFLOPs: 31.49 | 7: iteration 22870/ 115203 | consumed samples: 5854720 | consumed tokens: 11990466560 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.412994E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.626 | TFLOPs: 31.51 | 7: iteration 22880/ 115203 | consumed samples: 5857280 | consumed tokens: 11995709440 | elapsed time per iteration (s): 0.43 | learning rate: 1.844E-04 | global batch size: 256 | lm loss: 2.422111E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.852 | TFLOPs: 31.26 | 7: iteration 22890/ 115203 | consumed samples: 5859840 | consumed tokens: 12000952320 | elapsed time per iteration (s): 0.43 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.394225E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.043 | TFLOPs: 31.38 | 7: iteration 22900/ 115203 | consumed samples: 5862400 | consumed tokens: 12006195200 | elapsed time per iteration (s): 0.43 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.421404E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.564 | TFLOPs: 31.35 | 7: iteration 22910/ 115203 | consumed samples: 5864960 | consumed tokens: 12011438080 | elapsed time per iteration (s): 0.44 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.393573E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.721 | TFLOPs: 30.84 | 7: iteration 22920/ 115203 | consumed samples: 5867520 | consumed tokens: 12016680960 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.405980E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.887 | TFLOPs: 31.74 | 7: iteration 22930/ 115203 | consumed samples: 5870080 | consumed tokens: 12021923840 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.399471E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.503 | TFLOPs: 31.82 | 7: iteration 22940/ 115203 | consumed samples: 5872640 | consumed tokens: 12027166720 | elapsed time per iteration (s): 0.44 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.360180E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.365 | TFLOPs: 30.50 | 7: iteration 22950/ 115203 | consumed samples: 5875200 | consumed tokens: 12032409600 | elapsed time per iteration (s): 0.42 | learning rate: 1.843E-04 | global batch size: 256 | lm loss: 2.389968E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.336 | TFLOPs: 31.81 | 7: iteration 22960/ 115203 | consumed samples: 5877760 | consumed tokens: 12037652480 | elapsed time per iteration (s): 0.43 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.412068E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.885 | TFLOPs: 31.53 | 7: iteration 22970/ 115203 | consumed samples: 5880320 | consumed tokens: 12042895360 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.374106E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.227 | TFLOPs: 31.76 | 7: iteration 22980/ 115203 | consumed samples: 5882880 | consumed tokens: 12048138240 | elapsed time per iteration (s): 0.44 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.381788E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.360 | TFLOPs: 30.87 | 7: iteration 22990/ 115203 | consumed samples: 5885440 | consumed tokens: 12053381120 | elapsed time per iteration (s): 0.43 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.390240E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.496 | TFLOPs: 31.56 | 7: iteration 23000/ 115203 | consumed samples: 5888000 | consumed tokens: 12058624000 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.395126E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.823 | TFLOPs: 31.63 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 23000 | lm loss value: 2.330302E+00 | lm loss PPL: 1.028104E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 23000 to checkpoints_221m 0: [2022-11-28 15:42:41,498] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step23000 is begin to save! 0: [2022-11-28 15:42:41,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:42:41,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:42:41,616] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:42:41,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:42:41,638] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:42:41,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:42:41,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:42:41,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:42:41,689] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:42:41,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:42:41,714] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:42:41,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:42:41,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:42:41,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:42:41,766] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:42:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:42:41,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:42:41,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:42:41,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:42:41,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:42:41,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:42:41,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:42:41,867] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:42:41,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:42:41,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:42:41,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:42:41,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:42:41,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:42:41,944] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:42:41,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:42:41,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:42:41,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:42:41,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:42:42,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:42:42,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:42:42,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:42:42,042] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:42:42,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:42:42,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:42:42,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:42:42,074] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step23000/mp_rank_00_model_states.pt 0: [2022-11-28 15:42:42,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:42:42,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:42:42,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:42:42,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 2: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 7: [2022-11-28 15:42:42,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:42:42,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:42:42,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:42:42,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 6: [2022-11-28 15:42:42,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 1: [2022-11-28 15:42:42,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:42:42,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:42:42,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 4: [2022-11-28 15:42:42,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:42:42,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:42:42,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:42:42,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 3: [2022-11-28 15:42:42,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 5: [2022-11-28 15:42:42,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: [2022-11-28 15:42:42,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step23000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:42:42,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step23000 is ready now! 0: successfully saved checkpoint at iteration 23000 to checkpoints_221m 7: time (ms) | save-checkpoint: 730.12 7: iteration 23010/ 115203 | consumed samples: 5890560 | consumed tokens: 12063866880 | elapsed time per iteration (s): 0.52 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.406478E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 495.286 | TFLOPs: 25.99 | 7: iteration 23020/ 115203 | consumed samples: 5893120 | consumed tokens: 12069109760 | elapsed time per iteration (s): 0.42 | learning rate: 1.842E-04 | global batch size: 256 | lm loss: 2.389619E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.632 | TFLOPs: 31.72 | 7: iteration 23030/ 115203 | consumed samples: 5895680 | consumed tokens: 12074352640 | elapsed time per iteration (s): 0.43 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.389307E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.273 | TFLOPs: 31.23 | 7: iteration 23040/ 115203 | consumed samples: 5898240 | consumed tokens: 12079595520 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.392727E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.700 | TFLOPs: 31.83 | 7: iteration 23050/ 115203 | consumed samples: 5900800 | consumed tokens: 12084838400 | elapsed time per iteration (s): 0.43 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.368759E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.238 | TFLOPs: 31.18 | 7: iteration 23060/ 115203 | consumed samples: 5903360 | consumed tokens: 12090081280 | elapsed time per iteration (s): 0.43 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.419995E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.454 | TFLOPs: 30.93 | 7: iteration 23070/ 115203 | consumed samples: 5905920 | consumed tokens: 12095324160 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.390689E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.079 | TFLOPs: 31.80 | 7: iteration 23080/ 115203 | consumed samples: 5908480 | consumed tokens: 12100567040 | elapsed time per iteration (s): 0.42 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.419405E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.970 | TFLOPs: 32.06 | 7: iteration 23090/ 115203 | consumed samples: 5911040 | consumed tokens: 12105809920 | elapsed time per iteration (s): 0.44 | learning rate: 1.841E-04 | global batch size: 256 | lm loss: 2.373614E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.084 | TFLOPs: 30.70 | 7: iteration 23100/ 115203 | consumed samples: 5913600 | consumed tokens: 12111052800 | elapsed time per iteration (s): 0.44 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.388329E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.151 | TFLOPs: 30.65 | 7: iteration 23110/ 115203 | consumed samples: 5916160 | consumed tokens: 12116295680 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.382409E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.951 | TFLOPs: 31.85 | 7: iteration 23120/ 115203 | consumed samples: 5918720 | consumed tokens: 12121538560 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.370647E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.930 | TFLOPs: 31.90 | 7: iteration 23130/ 115203 | consumed samples: 5921280 | consumed tokens: 12126781440 | elapsed time per iteration (s): 0.43 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.440184E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.162 | TFLOPs: 31.59 | 7: iteration 23140/ 115203 | consumed samples: 5923840 | consumed tokens: 12132024320 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.387604E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.158 | TFLOPs: 31.80 | 7: iteration 23150/ 115203 | consumed samples: 5926400 | consumed tokens: 12137267200 | elapsed time per iteration (s): 0.43 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.371748E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.986 | TFLOPs: 31.59 | 7: iteration 23160/ 115203 | consumed samples: 5928960 | consumed tokens: 12142510080 | elapsed time per iteration (s): 0.42 | learning rate: 1.840E-04 | global batch size: 256 | lm loss: 2.359102E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.098 | TFLOPs: 31.75 | 7: iteration 23170/ 115203 | consumed samples: 5931520 | consumed tokens: 12147752960 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.416513E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.379 | TFLOPs: 31.76 | 7: iteration 23180/ 115203 | consumed samples: 5934080 | consumed tokens: 12152995840 | elapsed time per iteration (s): 0.43 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.400528E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.903 | TFLOPs: 31.32 | 7: iteration 23190/ 115203 | consumed samples: 5936640 | consumed tokens: 12158238720 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.421314E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.937 | TFLOPs: 31.84 | 7: iteration 23200/ 115203 | consumed samples: 5939200 | consumed tokens: 12163481600 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.401037E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.515 | TFLOPs: 31.67 | 7: iteration 23210/ 115203 | consumed samples: 5941760 | consumed tokens: 12168724480 | elapsed time per iteration (s): 0.43 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.380086E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.636 | TFLOPs: 31.36 | 7: iteration 23220/ 115203 | consumed samples: 5944320 | consumed tokens: 12173967360 | elapsed time per iteration (s): 0.42 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.396279E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.715 | TFLOPs: 31.83 | 7: iteration 23230/ 115203 | consumed samples: 5946880 | consumed tokens: 12179210240 | elapsed time per iteration (s): 0.44 | learning rate: 1.839E-04 | global batch size: 256 | lm loss: 2.387663E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.182 | TFLOPs: 30.49 | 7: iteration 23240/ 115203 | consumed samples: 5949440 | consumed tokens: 12184453120 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.392736E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.971 | TFLOPs: 31.74 | 7: iteration 23250/ 115203 | consumed samples: 5952000 | consumed tokens: 12189696000 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.418623E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.659 | TFLOPs: 31.99 | 7: iteration 23260/ 115203 | consumed samples: 5954560 | consumed tokens: 12194938880 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.385442E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.996 | TFLOPs: 32.22 | 7: iteration 23270/ 115203 | consumed samples: 5957120 | consumed tokens: 12200181760 | elapsed time per iteration (s): 0.43 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.414463E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.553 | TFLOPs: 31.09 | 7: iteration 23280/ 115203 | consumed samples: 5959680 | consumed tokens: 12205424640 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.394028E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.926 | TFLOPs: 32.16 | 7: iteration 23290/ 115203 | consumed samples: 5962240 | consumed tokens: 12210667520 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.390484E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.609 | TFLOPs: 31.62 | 7: iteration 23300/ 115203 | consumed samples: 5964800 | consumed tokens: 12215910400 | elapsed time per iteration (s): 0.44 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.415748E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.214 | TFLOPs: 30.50 | 7: iteration 23310/ 115203 | consumed samples: 5967360 | consumed tokens: 12221153280 | elapsed time per iteration (s): 0.42 | learning rate: 1.838E-04 | global batch size: 256 | lm loss: 2.397216E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.341 | TFLOPs: 31.97 | 7: iteration 23320/ 115203 | consumed samples: 5969920 | consumed tokens: 12226396160 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.410118E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.621 | TFLOPs: 32.14 | 7: iteration 23330/ 115203 | consumed samples: 5972480 | consumed tokens: 12231639040 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.381504E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.705 | TFLOPs: 31.99 | 7: iteration 23340/ 115203 | consumed samples: 5975040 | consumed tokens: 12236881920 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.407833E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.365 | TFLOPs: 32.02 | 7: iteration 23350/ 115203 | consumed samples: 5977600 | consumed tokens: 12242124800 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.402679E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.847 | TFLOPs: 31.79 | 7: iteration 23360/ 115203 | consumed samples: 5980160 | consumed tokens: 12247367680 | elapsed time per iteration (s): 0.42 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.407625E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.529 | TFLOPs: 32.09 | 7: iteration 23370/ 115203 | consumed samples: 5982720 | consumed tokens: 12252610560 | elapsed time per iteration (s): 0.43 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.388836E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.249 | TFLOPs: 31.44 | 7: iteration 23380/ 115203 | consumed samples: 5985280 | consumed tokens: 12257853440 | elapsed time per iteration (s): 0.43 | learning rate: 1.837E-04 | global batch size: 256 | lm loss: 2.411827E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.997 | TFLOPs: 31.32 | 7: iteration 23390/ 115203 | consumed samples: 5987840 | consumed tokens: 12263096320 | elapsed time per iteration (s): 0.43 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.394379E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.993 | TFLOPs: 30.90 | 7: iteration 23400/ 115203 | consumed samples: 5990400 | consumed tokens: 12268339200 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.364456E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.143 | TFLOPs: 32.12 | 7: iteration 23410/ 115203 | consumed samples: 5992960 | consumed tokens: 12273582080 | elapsed time per iteration (s): 0.43 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.390783E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.689 | TFLOPs: 31.57 | 7: iteration 23420/ 115203 | consumed samples: 5995520 | consumed tokens: 12278824960 | elapsed time per iteration (s): 0.42 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.403111E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.537 | TFLOPs: 31.82 | 7: iteration 23430/ 115203 | consumed samples: 5998080 | consumed tokens: 12284067840 | elapsed time per iteration (s): 0.43 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.379884E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.714 | TFLOPs: 31.26 | 7: iteration 23440/ 115203 | consumed samples: 6000640 | consumed tokens: 12289310720 | elapsed time per iteration (s): 0.43 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.403885E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.571 | TFLOPs: 31.41 | 7: iteration 23450/ 115203 | consumed samples: 6003200 | consumed tokens: 12294553600 | elapsed time per iteration (s): 0.43 | learning rate: 1.836E-04 | global batch size: 256 | lm loss: 2.381575E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.822 | TFLOPs: 31.21 | 7: iteration 23460/ 115203 | consumed samples: 6005760 | consumed tokens: 12299796480 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.380699E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.792 | TFLOPs: 31.78 | 7: iteration 23470/ 115203 | consumed samples: 6008320 | consumed tokens: 12305039360 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.414470E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.236 | TFLOPs: 31.76 | 7: iteration 23480/ 115203 | consumed samples: 6010880 | consumed tokens: 12310282240 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.412597E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.479 | TFLOPs: 31.87 | 7: iteration 23490/ 115203 | consumed samples: 6013440 | consumed tokens: 12315525120 | elapsed time per iteration (s): 0.42 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.403550E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.347 | TFLOPs: 31.81 | 7: iteration 23500/ 115203 | consumed samples: 6016000 | consumed tokens: 12320768000 | elapsed time per iteration (s): 0.43 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.406273E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.810 | TFLOPs: 31.05 | 7: iteration 23510/ 115203 | consumed samples: 6018560 | consumed tokens: 12326010880 | elapsed time per iteration (s): 0.43 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.366915E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.245 | TFLOPs: 31.55 | 7: iteration 23520/ 115203 | consumed samples: 6021120 | consumed tokens: 12331253760 | elapsed time per iteration (s): 0.43 | learning rate: 1.835E-04 | global batch size: 256 | lm loss: 2.417105E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.088 | TFLOPs: 31.43 | 7: iteration 23530/ 115203 | consumed samples: 6023680 | consumed tokens: 12336496640 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.402615E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.622 | TFLOPs: 31.72 | 7: iteration 23540/ 115203 | consumed samples: 6026240 | consumed tokens: 12341739520 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.481902E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.205 | TFLOPs: 31.65 | 7: iteration 23550/ 115203 | consumed samples: 6028800 | consumed tokens: 12346982400 | elapsed time per iteration (s): 0.43 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.433707E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.955 | TFLOPs: 31.48 | 7: iteration 23560/ 115203 | consumed samples: 6031360 | consumed tokens: 12352225280 | elapsed time per iteration (s): 0.42 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.422745E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.316 | TFLOPs: 31.92 | 7: iteration 23570/ 115203 | consumed samples: 6033920 | consumed tokens: 12357468160 | elapsed time per iteration (s): 0.43 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.391499E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.816 | TFLOPs: 31.05 | 7: iteration 23580/ 115203 | consumed samples: 6036480 | consumed tokens: 12362711040 | elapsed time per iteration (s): 0.44 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.385286E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.532 | TFLOPs: 30.20 | 7: iteration 23590/ 115203 | consumed samples: 6039040 | consumed tokens: 12367953920 | elapsed time per iteration (s): 0.43 | learning rate: 1.834E-04 | global batch size: 256 | lm loss: 2.367030E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.322 | TFLOPs: 31.60 | 7: iteration 23600/ 115203 | consumed samples: 6041600 | consumed tokens: 12373196800 | elapsed time per iteration (s): 0.43 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.396423E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.060 | TFLOPs: 31.59 | 7: iteration 23610/ 115203 | consumed samples: 6044160 | consumed tokens: 12378439680 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.407860E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.466 | TFLOPs: 31.77 | 7: iteration 23620/ 115203 | consumed samples: 6046720 | consumed tokens: 12383682560 | elapsed time per iteration (s): 0.43 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.423860E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.468 | TFLOPs: 31.14 | 7: iteration 23630/ 115203 | consumed samples: 6049280 | consumed tokens: 12388925440 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.378967E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.571 | TFLOPs: 31.77 | 7: iteration 23640/ 115203 | consumed samples: 6051840 | consumed tokens: 12394168320 | elapsed time per iteration (s): 0.43 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.391606E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.854 | TFLOPs: 31.37 | 7: iteration 23650/ 115203 | consumed samples: 6054400 | consumed tokens: 12399411200 | elapsed time per iteration (s): 0.42 | learning rate: 1.833E-04 | global batch size: 256 | lm loss: 2.357351E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.798 | TFLOPs: 31.94 | 7: iteration 23660/ 115203 | consumed samples: 6056960 | consumed tokens: 12404654080 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.395028E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.695 | TFLOPs: 31.83 | 7: iteration 23670/ 115203 | consumed samples: 6059520 | consumed tokens: 12409896960 | elapsed time per iteration (s): 0.43 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.393509E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.132 | TFLOPs: 31.59 | 7: iteration 23680/ 115203 | consumed samples: 6062080 | consumed tokens: 12415139840 | elapsed time per iteration (s): 6.39 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.386708E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 40.088 | TFLOPs: 2.10 | 7: iteration 23690/ 115203 | consumed samples: 6064640 | consumed tokens: 12420382720 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.402757E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.035 | TFLOPs: 31.80 | 7: iteration 23700/ 115203 | consumed samples: 6067200 | consumed tokens: 12425625600 | elapsed time per iteration (s): 0.43 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.403888E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.481 | TFLOPs: 30.93 | 7: iteration 23710/ 115203 | consumed samples: 6069760 | consumed tokens: 12430868480 | elapsed time per iteration (s): 0.43 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.393368E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.140 | TFLOPs: 31.38 | 7: iteration 23720/ 115203 | consumed samples: 6072320 | consumed tokens: 12436111360 | elapsed time per iteration (s): 0.42 | learning rate: 1.832E-04 | global batch size: 256 | lm loss: 2.387974E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.940 | TFLOPs: 31.69 | 7: iteration 23730/ 115203 | consumed samples: 6074880 | consumed tokens: 12441354240 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.414022E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.200 | TFLOPs: 31.70 | 7: iteration 23740/ 115203 | consumed samples: 6077440 | consumed tokens: 12446597120 | elapsed time per iteration (s): 0.43 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.357099E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.883 | TFLOPs: 31.06 | 7: iteration 23750/ 115203 | consumed samples: 6080000 | consumed tokens: 12451840000 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.366745E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.231 | TFLOPs: 32.02 | 7: iteration 23760/ 115203 | consumed samples: 6082560 | consumed tokens: 12457082880 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.385051E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.445 | TFLOPs: 31.92 | 7: iteration 23770/ 115203 | consumed samples: 6085120 | consumed tokens: 12462325760 | elapsed time per iteration (s): 0.43 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.410146E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.275 | TFLOPs: 31.29 | 7: iteration 23780/ 115203 | consumed samples: 6087680 | consumed tokens: 12467568640 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.415070E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.500 | TFLOPs: 31.98 | 7: iteration 23790/ 115203 | consumed samples: 6090240 | consumed tokens: 12472811520 | elapsed time per iteration (s): 0.42 | learning rate: 1.831E-04 | global batch size: 256 | lm loss: 2.420804E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.752 | TFLOPs: 31.63 | 7: iteration 23800/ 115203 | consumed samples: 6092800 | consumed tokens: 12478054400 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.395949E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.615 | TFLOPs: 31.78 | 7: iteration 23810/ 115203 | consumed samples: 6095360 | consumed tokens: 12483297280 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.378228E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.919 | TFLOPs: 31.84 | 7: iteration 23820/ 115203 | consumed samples: 6097920 | consumed tokens: 12488540160 | elapsed time per iteration (s): 0.43 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.382808E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.673 | TFLOPs: 31.36 | 7: iteration 23830/ 115203 | consumed samples: 6100480 | consumed tokens: 12493783040 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.394538E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.721 | TFLOPs: 31.73 | 7: iteration 23840/ 115203 | consumed samples: 6103040 | consumed tokens: 12499025920 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.360998E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.048 | TFLOPs: 31.64 | 7: iteration 23850/ 115203 | consumed samples: 6105600 | consumed tokens: 12504268800 | elapsed time per iteration (s): 0.42 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.397225E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.592 | TFLOPs: 31.72 | 7: iteration 23860/ 115203 | consumed samples: 6108160 | consumed tokens: 12509511680 | elapsed time per iteration (s): 0.43 | learning rate: 1.830E-04 | global batch size: 256 | lm loss: 2.367068E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.415 | TFLOPs: 31.24 | 7: iteration 23870/ 115203 | consumed samples: 6110720 | consumed tokens: 12514754560 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.391862E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.580 | TFLOPs: 31.88 | 7: iteration 23880/ 115203 | consumed samples: 6113280 | consumed tokens: 12519997440 | elapsed time per iteration (s): 0.43 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.421007E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.690 | TFLOPs: 31.36 | 7: iteration 23890/ 115203 | consumed samples: 6115840 | consumed tokens: 12525240320 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.396767E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.688 | TFLOPs: 31.88 | 7: iteration 23900/ 115203 | consumed samples: 6118400 | consumed tokens: 12530483200 | elapsed time per iteration (s): 0.44 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.432087E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.571 | TFLOPs: 30.83 | 7: iteration 23910/ 115203 | consumed samples: 6120960 | consumed tokens: 12535726080 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.387793E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.056 | TFLOPs: 32.01 | 7: iteration 23920/ 115203 | consumed samples: 6123520 | consumed tokens: 12540968960 | elapsed time per iteration (s): 0.43 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.373856E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.236 | TFLOPs: 31.49 | 7: iteration 23930/ 115203 | consumed samples: 6126080 | consumed tokens: 12546211840 | elapsed time per iteration (s): 0.42 | learning rate: 1.829E-04 | global batch size: 256 | lm loss: 2.410503E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.563 | TFLOPs: 31.67 | 7: iteration 23940/ 115203 | consumed samples: 6128640 | consumed tokens: 12551454720 | elapsed time per iteration (s): 0.43 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.389372E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.688 | TFLOPs: 31.57 | 7: iteration 23950/ 115203 | consumed samples: 6131200 | consumed tokens: 12556697600 | elapsed time per iteration (s): 0.43 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.388624E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.233 | TFLOPs: 31.39 | 7: iteration 23960/ 115203 | consumed samples: 6133760 | consumed tokens: 12561940480 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.368079E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.305 | TFLOPs: 31.97 | 7: iteration 23970/ 115203 | consumed samples: 6136320 | consumed tokens: 12567183360 | elapsed time per iteration (s): 0.43 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.402343E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.527 | TFLOPs: 31.51 | 7: iteration 23980/ 115203 | consumed samples: 6138880 | consumed tokens: 12572426240 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.402805E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.249 | TFLOPs: 32.02 | 7: iteration 23990/ 115203 | consumed samples: 6141440 | consumed tokens: 12577669120 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.404857E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.555 | TFLOPs: 31.82 | 0: [2022-11-28 15:50:47,207] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=0, lr=[0.00018275670559336077, 0.00018275670559336077, 0.00018275670559336077], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 24000/ 115203 | consumed samples: 6144000 | consumed tokens: 12582912000 | elapsed time per iteration (s): 0.42 | learning rate: 1.828E-04 | global batch size: 256 | lm loss: 2.386854E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.522 | TFLOPs: 32.09 | 0: steps: 24000 loss: 2.4702 iter time (s): 0.454 samples/sec: 563.895 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 24000 | lm loss value: 2.322715E+00 | lm loss PPL: 1.020334E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 24000 to checkpoints_221m 0: [2022-11-28 15:50:47,367] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step24000 is begin to save! 0: [2022-11-28 15:50:47,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:50:47,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:50:47,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:50:47,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:50:47,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:50:47,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:50:47,536] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:50:47,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:50:47,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:50:47,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:50:47,584] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:50:47,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:50:47,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:50:47,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:50:47,634] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:50:47,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:50:47,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:50:47,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:50:47,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:50:47,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:50:47,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:50:47,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:50:47,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:50:47,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:50:47,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:50:47,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:50:47,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:50:47,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:50:47,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:50:47,830] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:50:47,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:50:47,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:50:47,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:50:47,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:50:47,876] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:50:47,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:50:47,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:50:47,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:50:47,924] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:50:47,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:50:47,940] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step24000/mp_rank_00_model_states.pt 0: [2022-11-28 15:50:47,940] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:50:47,950] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:50:48,950] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:50:48,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:50:48,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:48,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:48,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:48,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:48,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:48,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:48,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:48,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:48,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:49,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 2: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:50:49,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2022-11-28 15:50:49,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2022-11-28 15:50:49,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 4: [2022-11-28 15:50:49,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:49,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:49,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 15:50:49,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 3: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:49,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 5: [2022-11-28 15:50:49,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:50:49,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 15:50:49,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 6: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:50:49,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 7: [2022-11-28 15:50:49,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 1: [2022-11-28 15:50:49,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:50:49,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:50:49,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: [2022-11-28 15:50:49,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:50:49,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step24000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:50:49,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step24000 is ready now! 0: successfully saved checkpoint at iteration 24000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1773.20 7: iteration 24010/ 115203 | consumed samples: 6146560 | consumed tokens: 12588154880 | elapsed time per iteration (s): 0.62 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.372773E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 415.971 | TFLOPs: 21.83 | 7: iteration 24020/ 115203 | consumed samples: 6149120 | consumed tokens: 12593397760 | elapsed time per iteration (s): 0.43 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.387907E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.330 | TFLOPs: 31.60 | 7: iteration 24030/ 115203 | consumed samples: 6151680 | consumed tokens: 12598640640 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.410627E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.130 | TFLOPs: 31.96 | 7: iteration 24040/ 115203 | consumed samples: 6154240 | consumed tokens: 12603883520 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.384982E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.058 | TFLOPs: 31.64 | 7: iteration 24050/ 115203 | consumed samples: 6156800 | consumed tokens: 12609126400 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.401736E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.426 | TFLOPs: 31.61 | 7: iteration 24060/ 115203 | consumed samples: 6159360 | consumed tokens: 12614369280 | elapsed time per iteration (s): 0.42 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.399644E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.817 | TFLOPs: 31.79 | 7: iteration 24070/ 115203 | consumed samples: 6161920 | consumed tokens: 12619612160 | elapsed time per iteration (s): 0.43 | learning rate: 1.827E-04 | global batch size: 256 | lm loss: 2.382026E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.365 | TFLOPs: 31.13 | 7: iteration 24080/ 115203 | consumed samples: 6164480 | consumed tokens: 12624855040 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.409063E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.310 | TFLOPs: 31.76 | 7: iteration 24090/ 115203 | consumed samples: 6167040 | consumed tokens: 12630097920 | elapsed time per iteration (s): 0.43 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.386736E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.867 | TFLOPs: 31.47 | 7: iteration 24100/ 115203 | consumed samples: 6169600 | consumed tokens: 12635340800 | elapsed time per iteration (s): 0.43 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.365995E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.562 | TFLOPs: 31.25 | 7: iteration 24110/ 115203 | consumed samples: 6172160 | consumed tokens: 12640583680 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.402801E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.505 | TFLOPs: 31.82 | 7: iteration 24120/ 115203 | consumed samples: 6174720 | consumed tokens: 12645826560 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.398365E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.686 | TFLOPs: 31.78 | 7: iteration 24130/ 115203 | consumed samples: 6177280 | consumed tokens: 12651069440 | elapsed time per iteration (s): 0.43 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.397952E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.528 | TFLOPs: 31.51 | 7: iteration 24140/ 115203 | consumed samples: 6179840 | consumed tokens: 12656312320 | elapsed time per iteration (s): 0.42 | learning rate: 1.826E-04 | global batch size: 256 | lm loss: 2.362072E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.571 | TFLOPs: 32.14 | 7: iteration 24150/ 115203 | consumed samples: 6182400 | consumed tokens: 12661555200 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.395057E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.636 | TFLOPs: 30.88 | 7: iteration 24160/ 115203 | consumed samples: 6184960 | consumed tokens: 12666798080 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.401056E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.032 | TFLOPs: 31.59 | 7: iteration 24170/ 115203 | consumed samples: 6187520 | consumed tokens: 12672040960 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.385491E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.685 | TFLOPs: 32.09 | 7: iteration 24180/ 115203 | consumed samples: 6190080 | consumed tokens: 12677283840 | elapsed time per iteration (s): 0.42 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.398988E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.444 | TFLOPs: 31.82 | 7: iteration 24190/ 115203 | consumed samples: 6192640 | consumed tokens: 12682526720 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.384719E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.260 | TFLOPs: 31.55 | 7: iteration 24200/ 115203 | consumed samples: 6195200 | consumed tokens: 12687769600 | elapsed time per iteration (s): 0.43 | learning rate: 1.825E-04 | global batch size: 256 | lm loss: 2.408835E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.510 | TFLOPs: 31.30 | 7: iteration 24210/ 115203 | consumed samples: 6197760 | consumed tokens: 12693012480 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.389750E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.338 | TFLOPs: 31.18 | 7: iteration 24220/ 115203 | consumed samples: 6200320 | consumed tokens: 12698255360 | elapsed time per iteration (s): 0.44 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.363421E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.079 | TFLOPs: 30.80 | 7: iteration 24230/ 115203 | consumed samples: 6202880 | consumed tokens: 12703498240 | elapsed time per iteration (s): 0.45 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.379224E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.588 | TFLOPs: 29.83 | 7: iteration 24240/ 115203 | consumed samples: 6205440 | consumed tokens: 12708741120 | elapsed time per iteration (s): 0.42 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.392052E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.779 | TFLOPs: 31.63 | 7: iteration 24250/ 115203 | consumed samples: 6208000 | consumed tokens: 12713984000 | elapsed time per iteration (s): 0.42 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.395450E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.006 | TFLOPs: 31.85 | 7: iteration 24260/ 115203 | consumed samples: 6210560 | consumed tokens: 12719226880 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.388178E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.164 | TFLOPs: 31.02 | 7: iteration 24270/ 115203 | consumed samples: 6213120 | consumed tokens: 12724469760 | elapsed time per iteration (s): 0.43 | learning rate: 1.824E-04 | global batch size: 256 | lm loss: 2.409017E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.513 | TFLOPs: 31.51 | 7: iteration 24280/ 115203 | consumed samples: 6215680 | consumed tokens: 12729712640 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.395372E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.809 | TFLOPs: 32.05 | 7: iteration 24290/ 115203 | consumed samples: 6218240 | consumed tokens: 12734955520 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.362009E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.141 | TFLOPs: 31.70 | 7: iteration 24300/ 115203 | consumed samples: 6220800 | consumed tokens: 12740198400 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.381678E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.930 | TFLOPs: 31.63 | 7: iteration 24310/ 115203 | consumed samples: 6223360 | consumed tokens: 12745441280 | elapsed time per iteration (s): 0.43 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.378949E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.245 | TFLOPs: 31.39 | 7: iteration 24320/ 115203 | consumed samples: 6225920 | consumed tokens: 12750684160 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.390711E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.031 | TFLOPs: 32.11 | 7: iteration 24330/ 115203 | consumed samples: 6228480 | consumed tokens: 12755927040 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.389752E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.085 | TFLOPs: 31.96 | 7: iteration 24340/ 115203 | consumed samples: 6231040 | consumed tokens: 12761169920 | elapsed time per iteration (s): 0.42 | learning rate: 1.823E-04 | global batch size: 256 | lm loss: 2.400201E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.752 | TFLOPs: 31.63 | 7: iteration 24350/ 115203 | consumed samples: 6233600 | consumed tokens: 12766412800 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.394343E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.628 | TFLOPs: 31.83 | 7: iteration 24360/ 115203 | consumed samples: 6236160 | consumed tokens: 12771655680 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.386028E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.901 | TFLOPs: 31.74 | 7: iteration 24370/ 115203 | consumed samples: 6238720 | consumed tokens: 12776898560 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.398268E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.782 | TFLOPs: 31.78 | 7: iteration 24380/ 115203 | consumed samples: 6241280 | consumed tokens: 12782141440 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.357822E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.660 | TFLOPs: 31.78 | 7: iteration 24390/ 115203 | consumed samples: 6243840 | consumed tokens: 12787384320 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.366562E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.159 | TFLOPs: 31.75 | 7: iteration 24400/ 115203 | consumed samples: 6246400 | consumed tokens: 12792627200 | elapsed time per iteration (s): 0.43 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.406764E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.929 | TFLOPs: 31.42 | 7: iteration 24410/ 115203 | consumed samples: 6248960 | consumed tokens: 12797870080 | elapsed time per iteration (s): 0.42 | learning rate: 1.822E-04 | global batch size: 256 | lm loss: 2.396553E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.181 | TFLOPs: 32.07 | 7: iteration 24420/ 115203 | consumed samples: 6251520 | consumed tokens: 12803112960 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.408560E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.724 | TFLOPs: 31.20 | 7: iteration 24430/ 115203 | consumed samples: 6254080 | consumed tokens: 12808355840 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.408386E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.369 | TFLOPs: 31.13 | 7: iteration 24440/ 115203 | consumed samples: 6256640 | consumed tokens: 12813598720 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.394300E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.896 | TFLOPs: 31.79 | 7: iteration 24450/ 115203 | consumed samples: 6259200 | consumed tokens: 12818841600 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.374709E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.358 | TFLOPs: 32.02 | 7: iteration 24460/ 115203 | consumed samples: 6261760 | consumed tokens: 12824084480 | elapsed time per iteration (s): 0.43 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.411490E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.920 | TFLOPs: 31.48 | 7: iteration 24470/ 115203 | consumed samples: 6264320 | consumed tokens: 12829327360 | elapsed time per iteration (s): 0.42 | learning rate: 1.821E-04 | global batch size: 256 | lm loss: 2.385662E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.291 | TFLOPs: 32.07 | 7: iteration 24480/ 115203 | consumed samples: 6266880 | consumed tokens: 12834570240 | elapsed time per iteration (s): 0.42 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.402584E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.869 | TFLOPs: 31.89 | 7: iteration 24490/ 115203 | consumed samples: 6269440 | consumed tokens: 12839813120 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.397051E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.487 | TFLOPs: 31.19 | 7: iteration 24500/ 115203 | consumed samples: 6272000 | consumed tokens: 12845056000 | elapsed time per iteration (s): 0.42 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.402929E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.951 | TFLOPs: 31.95 | 7: iteration 24510/ 115203 | consumed samples: 6274560 | consumed tokens: 12850298880 | elapsed time per iteration (s): 0.42 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.415680E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.367 | TFLOPs: 31.76 | 7: iteration 24520/ 115203 | consumed samples: 6277120 | consumed tokens: 12855541760 | elapsed time per iteration (s): 0.42 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.435168E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.468 | TFLOPs: 32.14 | 7: iteration 24530/ 115203 | consumed samples: 6279680 | consumed tokens: 12860784640 | elapsed time per iteration (s): 0.43 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.392468E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.255 | TFLOPs: 31.55 | 7: iteration 24540/ 115203 | consumed samples: 6282240 | consumed tokens: 12866027520 | elapsed time per iteration (s): 0.42 | learning rate: 1.820E-04 | global batch size: 256 | lm loss: 2.415657E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.053 | TFLOPs: 32.06 | 7: iteration 24550/ 115203 | consumed samples: 6284800 | consumed tokens: 12871270400 | elapsed time per iteration (s): 0.44 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.394189E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.013 | TFLOPs: 30.80 | 7: iteration 24560/ 115203 | consumed samples: 6287360 | consumed tokens: 12876513280 | elapsed time per iteration (s): 0.42 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.401967E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.946 | TFLOPs: 31.69 | 7: iteration 24570/ 115203 | consumed samples: 6289920 | consumed tokens: 12881756160 | elapsed time per iteration (s): 0.44 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.426642E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.130 | TFLOPs: 30.23 | 7: iteration 24580/ 115203 | consumed samples: 6292480 | consumed tokens: 12886999040 | elapsed time per iteration (s): 0.42 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.412768E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.909 | TFLOPs: 31.95 | 7: iteration 24590/ 115203 | consumed samples: 6295040 | consumed tokens: 12892241920 | elapsed time per iteration (s): 0.43 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.346949E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.226 | TFLOPs: 31.60 | 7: iteration 24600/ 115203 | consumed samples: 6297600 | consumed tokens: 12897484800 | elapsed time per iteration (s): 0.42 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.388666E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.587 | TFLOPs: 31.72 | 7: iteration 24610/ 115203 | consumed samples: 6300160 | consumed tokens: 12902727680 | elapsed time per iteration (s): 0.42 | learning rate: 1.819E-04 | global batch size: 256 | lm loss: 2.395814E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.288 | TFLOPs: 31.86 | 7: iteration 24620/ 115203 | consumed samples: 6302720 | consumed tokens: 12907970560 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.431183E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.818 | TFLOPs: 31.31 | 7: iteration 24630/ 115203 | consumed samples: 6305280 | consumed tokens: 12913213440 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.388174E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.000 | TFLOPs: 31.59 | 7: iteration 24640/ 115203 | consumed samples: 6307840 | consumed tokens: 12918456320 | elapsed time per iteration (s): 0.42 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.416523E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.284 | TFLOPs: 31.76 | 7: iteration 24650/ 115203 | consumed samples: 6310400 | consumed tokens: 12923699200 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.406396E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.109 | TFLOPs: 31.38 | 7: iteration 24660/ 115203 | consumed samples: 6312960 | consumed tokens: 12928942080 | elapsed time per iteration (s): 0.43 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.401381E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | 7: iteration 24670/ 115203 | consumed samples: 6315520 | consumed tokens: 12934184960 | elapsed time per iteration (s): 0.42 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.403272E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.080 | TFLOPs: 31.75 | 7: iteration 24680/ 115203 | consumed samples: 6318080 | consumed tokens: 12939427840 | elapsed time per iteration (s): 0.42 | learning rate: 1.818E-04 | global batch size: 256 | lm loss: 2.385019E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.426 | TFLOPs: 31.87 | 7: iteration 24690/ 115203 | consumed samples: 6320640 | consumed tokens: 12944670720 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.424828E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.369 | TFLOPs: 31.50 | 7: iteration 24700/ 115203 | consumed samples: 6323200 | consumed tokens: 12949913600 | elapsed time per iteration (s): 0.42 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.416929E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.644 | TFLOPs: 32.04 | 7: iteration 24710/ 115203 | consumed samples: 6325760 | consumed tokens: 12955156480 | elapsed time per iteration (s): 0.43 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.338689E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.492 | TFLOPs: 31.24 | 7: iteration 24720/ 115203 | consumed samples: 6328320 | consumed tokens: 12960399360 | elapsed time per iteration (s): 0.42 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.391187E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.298 | TFLOPs: 31.86 | 7: iteration 24730/ 115203 | consumed samples: 6330880 | consumed tokens: 12965642240 | elapsed time per iteration (s): 0.42 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.369774E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.147 | TFLOPs: 31.70 | 7: iteration 24740/ 115203 | consumed samples: 6333440 | consumed tokens: 12970885120 | elapsed time per iteration (s): 0.42 | learning rate: 1.817E-04 | global batch size: 256 | lm loss: 2.387126E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.156 | TFLOPs: 31.91 | 7: iteration 24750/ 115203 | consumed samples: 6336000 | consumed tokens: 12976128000 | elapsed time per iteration (s): 0.42 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.370204E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.229 | TFLOPs: 31.97 | 7: iteration 24760/ 115203 | consumed samples: 6338560 | consumed tokens: 12981370880 | elapsed time per iteration (s): 0.42 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.370476E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.524 | TFLOPs: 31.93 | 7: iteration 24770/ 115203 | consumed samples: 6341120 | consumed tokens: 12986613760 | elapsed time per iteration (s): 0.42 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.400502E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.052 | TFLOPs: 31.90 | 7: iteration 24780/ 115203 | consumed samples: 6343680 | consumed tokens: 12991856640 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.392779E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.338 | TFLOPs: 31.45 | 7: iteration 24790/ 115203 | consumed samples: 6346240 | consumed tokens: 12997099520 | elapsed time per iteration (s): 0.43 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.393779E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.415 | TFLOPs: 31.29 | 7: iteration 24800/ 115203 | consumed samples: 6348800 | consumed tokens: 13002342400 | elapsed time per iteration (s): 0.42 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.387516E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.675 | TFLOPs: 31.83 | 7: iteration 24810/ 115203 | consumed samples: 6351360 | consumed tokens: 13007585280 | elapsed time per iteration (s): 0.44 | learning rate: 1.816E-04 | global batch size: 256 | lm loss: 2.411705E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.239 | TFLOPs: 30.86 | 7: iteration 24820/ 115203 | consumed samples: 6353920 | consumed tokens: 13012828160 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.400086E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.304 | TFLOPs: 31.08 | 7: iteration 24830/ 115203 | consumed samples: 6356480 | consumed tokens: 13018071040 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.400593E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.812 | TFLOPs: 31.58 | 7: iteration 24840/ 115203 | consumed samples: 6359040 | consumed tokens: 13023313920 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.371767E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.959 | TFLOPs: 32.06 | 7: iteration 24850/ 115203 | consumed samples: 6361600 | consumed tokens: 13028556800 | elapsed time per iteration (s): 0.43 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.399258E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.250 | TFLOPs: 31.23 | 7: iteration 24860/ 115203 | consumed samples: 6364160 | consumed tokens: 13033799680 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.346895E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.153 | TFLOPs: 31.80 | 7: iteration 24870/ 115203 | consumed samples: 6366720 | consumed tokens: 13039042560 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.393345E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.399 | TFLOPs: 31.82 | 7: iteration 24880/ 115203 | consumed samples: 6369280 | consumed tokens: 13044285440 | elapsed time per iteration (s): 0.42 | learning rate: 1.815E-04 | global batch size: 256 | lm loss: 2.377461E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.102 | TFLOPs: 32.22 | 7: iteration 24890/ 115203 | consumed samples: 6371840 | consumed tokens: 13049528320 | elapsed time per iteration (s): 0.42 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.388974E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.590 | TFLOPs: 31.67 | 7: iteration 24900/ 115203 | consumed samples: 6374400 | consumed tokens: 13054771200 | elapsed time per iteration (s): 0.44 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.366764E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.579 | TFLOPs: 30.78 | 7: iteration 24910/ 115203 | consumed samples: 6376960 | consumed tokens: 13060014080 | elapsed time per iteration (s): 0.42 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.368921E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.676 | TFLOPs: 31.99 | 7: iteration 24920/ 115203 | consumed samples: 6379520 | consumed tokens: 13065256960 | elapsed time per iteration (s): 0.44 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.386192E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.288 | TFLOPs: 30.50 | 7: iteration 24930/ 115203 | consumed samples: 6382080 | consumed tokens: 13070499840 | elapsed time per iteration (s): 0.42 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.386494E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.709 | TFLOPs: 31.94 | 7: iteration 24940/ 115203 | consumed samples: 6384640 | consumed tokens: 13075742720 | elapsed time per iteration (s): 0.42 | learning rate: 1.814E-04 | global batch size: 256 | lm loss: 2.379398E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.066 | TFLOPs: 31.96 | 7: iteration 24950/ 115203 | consumed samples: 6387200 | consumed tokens: 13080985600 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.389151E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.008 | TFLOPs: 31.53 | 7: iteration 24960/ 115203 | consumed samples: 6389760 | consumed tokens: 13086228480 | elapsed time per iteration (s): 0.43 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.375971E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.707 | TFLOPs: 31.57 | 7: iteration 24970/ 115203 | consumed samples: 6392320 | consumed tokens: 13091471360 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.365804E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.559 | TFLOPs: 32.19 | 7: iteration 24980/ 115203 | consumed samples: 6394880 | consumed tokens: 13096714240 | elapsed time per iteration (s): 0.44 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.369147E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.822 | TFLOPs: 30.32 | 7: iteration 24990/ 115203 | consumed samples: 6397440 | consumed tokens: 13101957120 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.387907E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.587 | TFLOPs: 31.62 | 7: iteration 25000/ 115203 | consumed samples: 6400000 | consumed tokens: 13107200000 | elapsed time per iteration (s): 0.42 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.385886E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.835 | TFLOPs: 31.68 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 25000 | lm loss value: 2.327152E+00 | lm loss PPL: 1.024871E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 25000 to checkpoints_221m 0: [2022-11-28 15:57:54,542] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step25000 is begin to save! 0: [2022-11-28 15:57:54,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_01-model_00-model_states.pt... 0: [2022-11-28 15:57:54,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_01-model_00-model_states.pt. 0: [2022-11-28 15:57:54,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_03-model_00-model_states.pt... 0: [2022-11-28 15:57:54,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_03-model_00-model_states.pt. 0: [2022-11-28 15:57:54,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_04-model_00-model_states.pt... 0: [2022-11-28 15:57:54,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_04-model_00-model_states.pt. 0: [2022-11-28 15:57:54,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_05-model_00-model_states.pt... 0: [2022-11-28 15:57:54,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_05-model_00-model_states.pt. 0: [2022-11-28 15:57:54,728] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_06-model_00-model_states.pt... 0: [2022-11-28 15:57:54,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_06-model_00-model_states.pt. 0: [2022-11-28 15:57:54,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_07-model_00-model_states.pt... 0: [2022-11-28 15:57:54,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_07-model_00-model_states.pt. 0: [2022-11-28 15:57:54,774] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_08-model_00-model_states.pt... 0: [2022-11-28 15:57:54,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_08-model_00-model_states.pt. 0: [2022-11-28 15:57:54,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_09-model_00-model_states.pt... 0: [2022-11-28 15:57:54,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_09-model_00-model_states.pt. 0: [2022-11-28 15:57:54,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_10-model_00-model_states.pt... 0: [2022-11-28 15:57:54,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_10-model_00-model_states.pt. 0: [2022-11-28 15:57:54,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_11-model_00-model_states.pt... 0: [2022-11-28 15:57:54,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_11-model_00-model_states.pt. 0: [2022-11-28 15:57:54,866] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_12-model_00-model_states.pt... 0: [2022-11-28 15:57:54,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_12-model_00-model_states.pt. 0: [2022-11-28 15:57:54,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_13-model_00-model_states.pt... 0: [2022-11-28 15:57:54,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_13-model_00-model_states.pt. 0: [2022-11-28 15:57:54,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_14-model_00-model_states.pt... 0: [2022-11-28 15:57:54,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_14-model_00-model_states.pt. 0: [2022-11-28 15:57:54,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_15-model_00-model_states.pt... 0: [2022-11-28 15:57:54,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_15-model_00-model_states.pt. 0: [2022-11-28 15:57:54,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_16-model_00-model_states.pt... 0: [2022-11-28 15:57:54,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_16-model_00-model_states.pt. 0: [2022-11-28 15:57:54,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_17-model_00-model_states.pt... 0: [2022-11-28 15:57:55,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_17-model_00-model_states.pt. 0: [2022-11-28 15:57:55,005] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_18-model_00-model_states.pt... 0: [2022-11-28 15:57:55,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_18-model_00-model_states.pt. 0: [2022-11-28 15:57:55,028] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_19-model_00-model_states.pt... 0: [2022-11-28 15:57:55,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_19-model_00-model_states.pt. 0: [2022-11-28 15:57:55,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_20-model_00-model_states.pt... 0: [2022-11-28 15:57:55,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_20-model_00-model_states.pt. 0: [2022-11-28 15:57:55,076] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/layer_22-model_00-model_states.pt... 0: [2022-11-28 15:57:55,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/layer_22-model_00-model_states.pt. 0: [2022-11-28 15:57:55,081] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step25000/mp_rank_00_model_states.pt 0: [2022-11-28 15:57:55,081] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/mp_rank_00_model_states.pt... 0: [2022-11-28 15:57:55,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/mp_rank_00_model_states.pt. 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-28 15:57:55,102] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2022-11-28 15:57:55,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 2: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 15:57:55,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 7: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 7: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,157] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 3: [2022-11-28 15:57:55,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 15:57:55,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 5: [2022-11-28 15:57:55,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 15:57:55,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 15:57:55,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 15:57:55,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 4: [2022-11-28 15:57:55,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 15:57:55,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 15:57:55,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 6: [2022-11-28 15:57:55,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 15:57:55,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 15:57:55,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 15:57:55,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 15:57:55,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: [2022-11-28 15:57:55,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step25000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 15:57:55,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step25000 is ready now! 0: successfully saved checkpoint at iteration 25000 to checkpoints_221m 7: time (ms) | save-checkpoint: 689.25 7: iteration 25010/ 115203 | consumed samples: 6402560 | consumed tokens: 13112442880 | elapsed time per iteration (s): 0.51 | learning rate: 1.813E-04 | global batch size: 256 | lm loss: 2.376587E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 504.969 | TFLOPs: 26.49 | 7: iteration 25020/ 115203 | consumed samples: 6405120 | consumed tokens: 13117685760 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.379899E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.710 | TFLOPs: 31.94 | 7: iteration 25030/ 115203 | consumed samples: 6407680 | consumed tokens: 13122928640 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.383237E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.293 | TFLOPs: 31.86 | 7: iteration 25040/ 115203 | consumed samples: 6410240 | consumed tokens: 13128171520 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.398246E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.258 | TFLOPs: 31.81 | 7: iteration 25050/ 115203 | consumed samples: 6412800 | consumed tokens: 13133414400 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.406012E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.576 | TFLOPs: 31.72 | 7: iteration 25060/ 115203 | consumed samples: 6415360 | consumed tokens: 13138657280 | elapsed time per iteration (s): 0.42 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.391328E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.749 | TFLOPs: 31.84 | 7: iteration 25070/ 115203 | consumed samples: 6417920 | consumed tokens: 13143900160 | elapsed time per iteration (s): 0.44 | learning rate: 1.812E-04 | global batch size: 256 | lm loss: 2.380713E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.233 | TFLOPs: 30.39 | 7: iteration 25080/ 115203 | consumed samples: 6420480 | consumed tokens: 13149143040 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.402656E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.805 | TFLOPs: 31.26 | 7: iteration 25090/ 115203 | consumed samples: 6423040 | consumed tokens: 13154385920 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.388983E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.275 | TFLOPs: 31.86 | 7: iteration 25100/ 115203 | consumed samples: 6425600 | consumed tokens: 13159628800 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.393978E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.825 | TFLOPs: 32.10 | 7: iteration 25110/ 115203 | consumed samples: 6428160 | consumed tokens: 13164871680 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.383101E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.471 | TFLOPs: 31.87 | 7: iteration 25120/ 115203 | consumed samples: 6430720 | consumed tokens: 13170114560 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.371004E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.280 | TFLOPs: 32.07 | 7: iteration 25130/ 115203 | consumed samples: 6433280 | consumed tokens: 13175357440 | elapsed time per iteration (s): 0.43 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.386200E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.389 | TFLOPs: 31.50 | 7: iteration 25140/ 115203 | consumed samples: 6435840 | consumed tokens: 13180600320 | elapsed time per iteration (s): 0.42 | learning rate: 1.811E-04 | global batch size: 256 | lm loss: 2.381414E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.111 | TFLOPs: 32.01 | 7: iteration 25150/ 115203 | consumed samples: 6438400 | consumed tokens: 13185843200 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.402114E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.716 | TFLOPs: 30.99 | 7: iteration 25160/ 115203 | consumed samples: 6440960 | consumed tokens: 13191086080 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.406340E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.671 | TFLOPs: 31.31 | 7: iteration 25170/ 115203 | consumed samples: 6443520 | consumed tokens: 13196328960 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.391156E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.614 | TFLOPs: 32.20 | 7: iteration 25180/ 115203 | consumed samples: 6446080 | consumed tokens: 13201571840 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.377896E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.411 | TFLOPs: 31.71 | 7: iteration 25190/ 115203 | consumed samples: 6448640 | consumed tokens: 13206814720 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.403313E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.644 | TFLOPs: 31.88 | 7: iteration 25200/ 115203 | consumed samples: 6451200 | consumed tokens: 13212057600 | elapsed time per iteration (s): 0.43 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.390748E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.154 | TFLOPs: 31.54 | 7: iteration 25210/ 115203 | consumed samples: 6453760 | consumed tokens: 13217300480 | elapsed time per iteration (s): 0.42 | learning rate: 1.810E-04 | global batch size: 256 | lm loss: 2.400096E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.479 | TFLOPs: 31.77 | 7: iteration 25220/ 115203 | consumed samples: 6456320 | consumed tokens: 13222543360 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.403484E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.118 | TFLOPs: 32.01 | 7: iteration 25230/ 115203 | consumed samples: 6458880 | consumed tokens: 13227786240 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.374178E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.720 | TFLOPs: 32.04 | 7: iteration 25240/ 115203 | consumed samples: 6461440 | consumed tokens: 13233029120 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.391381E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.459 | TFLOPs: 31.92 | 7: iteration 25250/ 115203 | consumed samples: 6464000 | consumed tokens: 13238272000 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.378550E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.687 | TFLOPs: 32.15 | 7: iteration 25260/ 115203 | consumed samples: 6466560 | consumed tokens: 13243514880 | elapsed time per iteration (s): 0.42 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.383780E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.328 | TFLOPs: 31.76 | 7: iteration 25270/ 115203 | consumed samples: 6469120 | consumed tokens: 13248757760 | elapsed time per iteration (s): 0.44 | learning rate: 1.809E-04 | global batch size: 256 | lm loss: 2.406486E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.160 | TFLOPs: 30.81 | 7: iteration 25280/ 115203 | consumed samples: 6471680 | consumed tokens: 13254000640 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.412825E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.675 | TFLOPs: 31.83 | 7: iteration 25290/ 115203 | consumed samples: 6474240 | consumed tokens: 13259243520 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.333079E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.226 | TFLOPs: 31.13 | 7: iteration 25300/ 115203 | consumed samples: 6476800 | consumed tokens: 13264486400 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.397139E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.981 | TFLOPs: 31.79 | 7: iteration 25310/ 115203 | consumed samples: 6479360 | consumed tokens: 13269729280 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.378322E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.862 | TFLOPs: 31.74 | 7: iteration 25320/ 115203 | consumed samples: 6481920 | consumed tokens: 13274972160 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.384511E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.652 | TFLOPs: 31.46 | 7: iteration 25330/ 115203 | consumed samples: 6484480 | consumed tokens: 13280215040 | elapsed time per iteration (s): 0.42 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.385882E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.031 | TFLOPs: 32.27 | 7: iteration 25340/ 115203 | consumed samples: 6487040 | consumed tokens: 13285457920 | elapsed time per iteration (s): 0.43 | learning rate: 1.808E-04 | global batch size: 256 | lm loss: 2.390473E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.422 | TFLOPs: 31.45 | 7: iteration 25350/ 115203 | consumed samples: 6489600 | consumed tokens: 13290700800 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.378487E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.142 | TFLOPs: 31.65 | 7: iteration 25360/ 115203 | consumed samples: 6492160 | consumed tokens: 13295943680 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.379347E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.859 | TFLOPs: 31.68 | 7: iteration 25370/ 115203 | consumed samples: 6494720 | consumed tokens: 13301186560 | elapsed time per iteration (s): 0.43 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.369738E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.662 | TFLOPs: 31.52 | 7: iteration 25380/ 115203 | consumed samples: 6497280 | consumed tokens: 13306429440 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.402806E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.438 | TFLOPs: 32.08 | 7: iteration 25390/ 115203 | consumed samples: 6499840 | consumed tokens: 13311672320 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.422079E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.211 | TFLOPs: 32.07 | 7: iteration 25400/ 115203 | consumed samples: 6502400 | consumed tokens: 13316915200 | elapsed time per iteration (s): 0.42 | learning rate: 1.807E-04 | global batch size: 256 | lm loss: 2.397674E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.829 | TFLOPs: 31.84 | 7: iteration 25410/ 115203 | consumed samples: 6504960 | consumed tokens: 13322158080 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.358007E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.670 | TFLOPs: 31.78 | 7: iteration 25420/ 115203 | consumed samples: 6507520 | consumed tokens: 13327400960 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.368274E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.840 | TFLOPs: 31.89 | 7: iteration 25430/ 115203 | consumed samples: 6510080 | consumed tokens: 13332643840 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.407903E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.024 | TFLOPs: 32.22 | 7: iteration 25440/ 115203 | consumed samples: 6512640 | consumed tokens: 13337886720 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.376896E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.406 | TFLOPs: 31.19 | 7: iteration 25450/ 115203 | consumed samples: 6515200 | consumed tokens: 13343129600 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.380593E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.629 | TFLOPs: 32.14 | 7: iteration 25460/ 115203 | consumed samples: 6517760 | consumed tokens: 13348372480 | elapsed time per iteration (s): 0.42 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.409554E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.125 | TFLOPs: 31.96 | 7: iteration 25470/ 115203 | consumed samples: 6520320 | consumed tokens: 13353615360 | elapsed time per iteration (s): 0.43 | learning rate: 1.806E-04 | global batch size: 256 | lm loss: 2.394180E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.563 | TFLOPs: 30.88 | 7: iteration 25480/ 115203 | consumed samples: 6522880 | consumed tokens: 13358858240 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.395148E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.425 | TFLOPs: 31.92 | 7: iteration 25490/ 115203 | consumed samples: 6525440 | consumed tokens: 13364101120 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.363258E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.288 | TFLOPs: 31.92 | 7: iteration 25500/ 115203 | consumed samples: 6528000 | consumed tokens: 13369344000 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.385469E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.962 | TFLOPs: 31.79 | 7: iteration 25510/ 115203 | consumed samples: 6530560 | consumed tokens: 13374586880 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.355403E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.453 | TFLOPs: 31.35 | 7: iteration 25520/ 115203 | consumed samples: 6533120 | consumed tokens: 13379829760 | elapsed time per iteration (s): 0.43 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.373942E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.475 | TFLOPs: 31.51 | 7: iteration 25530/ 115203 | consumed samples: 6535680 | consumed tokens: 13385072640 | elapsed time per iteration (s): 0.42 | learning rate: 1.805E-04 | global batch size: 256 | lm loss: 2.384522E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.827 | TFLOPs: 32.05 | 7: iteration 25540/ 115203 | consumed samples: 6538240 | consumed tokens: 13390315520 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.381363E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.555 | TFLOPs: 31.82 | 7: iteration 25550/ 115203 | consumed samples: 6540800 | consumed tokens: 13395558400 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.381549E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.251 | TFLOPs: 31.97 | 7: iteration 25560/ 115203 | consumed samples: 6543360 | consumed tokens: 13400801280 | elapsed time per iteration (s): 0.44 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.394136E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.807 | TFLOPs: 30.58 | 7: iteration 25570/ 115203 | consumed samples: 6545920 | consumed tokens: 13406044160 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.374737E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.118 | TFLOPs: 31.80 | 7: iteration 25580/ 115203 | consumed samples: 6548480 | consumed tokens: 13411287040 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.363195E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.573 | TFLOPs: 32.19 | 7: iteration 25590/ 115203 | consumed samples: 6551040 | consumed tokens: 13416529920 | elapsed time per iteration (s): 0.43 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.370050E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.085 | TFLOPs: 31.54 | 7: iteration 25600/ 115203 | consumed samples: 6553600 | consumed tokens: 13421772800 | elapsed time per iteration (s): 0.42 | learning rate: 1.804E-04 | global batch size: 256 | lm loss: 2.368467E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.985 | TFLOPs: 31.85 | 7: iteration 25610/ 115203 | consumed samples: 6556160 | consumed tokens: 13427015680 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.384343E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.372 | TFLOPs: 31.03 | 7: iteration 25620/ 115203 | consumed samples: 6558720 | consumed tokens: 13432258560 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.412086E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.324 | TFLOPs: 32.13 | 7: iteration 25630/ 115203 | consumed samples: 6561280 | consumed tokens: 13437501440 | elapsed time per iteration (s): 0.45 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.408182E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.075 | TFLOPs: 29.75 | 7: iteration 25640/ 115203 | consumed samples: 6563840 | consumed tokens: 13442744320 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.386208E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.013 | TFLOPs: 31.90 | 7: iteration 25650/ 115203 | consumed samples: 6566400 | consumed tokens: 13447987200 | elapsed time per iteration (s): 0.43 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.400483E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.932 | TFLOPs: 31.43 | 7: iteration 25660/ 115203 | consumed samples: 6568960 | consumed tokens: 13453230080 | elapsed time per iteration (s): 0.42 | learning rate: 1.803E-04 | global batch size: 256 | lm loss: 2.403132E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.559 | TFLOPs: 32.24 | 7: iteration 25670/ 115203 | consumed samples: 6571520 | consumed tokens: 13458472960 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.394956E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.357 | TFLOPs: 32.08 | 7: iteration 25680/ 115203 | consumed samples: 6574080 | consumed tokens: 13463715840 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.377799E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.459 | TFLOPs: 31.66 | 7: iteration 25690/ 115203 | consumed samples: 6576640 | consumed tokens: 13468958720 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.386852E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.702 | TFLOPs: 31.41 | 7: iteration 25700/ 115203 | consumed samples: 6579200 | consumed tokens: 13474201600 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.406698E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.350 | TFLOPs: 31.87 | 7: iteration 25710/ 115203 | consumed samples: 6581760 | consumed tokens: 13479444480 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.358633E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.392 | TFLOPs: 31.87 | 7: iteration 25720/ 115203 | consumed samples: 6584320 | consumed tokens: 13484687360 | elapsed time per iteration (s): 0.43 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.384871E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.303 | TFLOPs: 31.55 | 7: iteration 25730/ 115203 | consumed samples: 6586880 | consumed tokens: 13489930240 | elapsed time per iteration (s): 0.42 | learning rate: 1.802E-04 | global batch size: 256 | lm loss: 2.383191E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.097 | TFLOPs: 32.01 | 7: iteration 25740/ 115203 | consumed samples: 6589440 | consumed tokens: 13495173120 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.389142E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.204 | TFLOPs: 31.75 | 7: iteration 25750/ 115203 | consumed samples: 6592000 | consumed tokens: 13500416000 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.390255E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.941 | TFLOPs: 31.16 | 7: iteration 25760/ 115203 | consumed samples: 6594560 | consumed tokens: 13505658880 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.414220E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.549 | TFLOPs: 32.14 | 7: iteration 25770/ 115203 | consumed samples: 6597120 | consumed tokens: 13510901760 | elapsed time per iteration (s): 0.43 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.387504E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.564 | TFLOPs: 31.46 | 7: iteration 25780/ 115203 | consumed samples: 6599680 | consumed tokens: 13516144640 | elapsed time per iteration (s): 0.44 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.375876E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.999 | TFLOPs: 30.22 | 7: iteration 25790/ 115203 | consumed samples: 6602240 | consumed tokens: 13521387520 | elapsed time per iteration (s): 0.42 | learning rate: 1.801E-04 | global batch size: 256 | lm loss: 2.410460E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.218 | TFLOPs: 31.75 | 7: iteration 25800/ 115203 | consumed samples: 6604800 | consumed tokens: 13526630400 | elapsed time per iteration (s): 0.43 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.381694E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.694 | TFLOPs: 31.10 | 7: iteration 25810/ 115203 | consumed samples: 6607360 | consumed tokens: 13531873280 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.406766E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.349 | TFLOPs: 32.02 | 7: iteration 25820/ 115203 | consumed samples: 6609920 | consumed tokens: 13537116160 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.402865E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.470 | TFLOPs: 31.72 | 7: iteration 25830/ 115203 | consumed samples: 6612480 | consumed tokens: 13542359040 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.388204E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.175 | TFLOPs: 31.86 | 7: iteration 25840/ 115203 | consumed samples: 6615040 | consumed tokens: 13547601920 | elapsed time per iteration (s): 0.43 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.383942E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.283 | TFLOPs: 31.44 | 7: iteration 25850/ 115203 | consumed samples: 6617600 | consumed tokens: 13552844800 | elapsed time per iteration (s): 0.42 | learning rate: 1.800E-04 | global batch size: 256 | lm loss: 2.405432E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.107 | TFLOPs: 31.64 | 7: iteration 25860/ 115203 | consumed samples: 6620160 | consumed tokens: 13558087680 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.400268E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.667 | TFLOPs: 32.04 | 7: iteration 25870/ 115203 | consumed samples: 6622720 | consumed tokens: 13563330560 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.384625E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.898 | TFLOPs: 31.90 | 7: iteration 25880/ 115203 | consumed samples: 6625280 | consumed tokens: 13568573440 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.361965E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.197 | TFLOPs: 32.07 | 7: iteration 25890/ 115203 | consumed samples: 6627840 | consumed tokens: 13573816320 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.366930E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.986 | TFLOPs: 31.85 | 7: iteration 25900/ 115203 | consumed samples: 6630400 | consumed tokens: 13579059200 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.393723E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.596 | TFLOPs: 31.98 | 7: iteration 25910/ 115203 | consumed samples: 6632960 | consumed tokens: 13584302080 | elapsed time per iteration (s): 0.42 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.393567E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.183 | TFLOPs: 31.75 | 7: iteration 25920/ 115203 | consumed samples: 6635520 | consumed tokens: 13589544960 | elapsed time per iteration (s): 0.43 | learning rate: 1.799E-04 | global batch size: 256 | lm loss: 2.377130E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.982 | TFLOPs: 31.43 | 7: iteration 25930/ 115203 | consumed samples: 6638080 | consumed tokens: 13594787840 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.393129E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.111 | TFLOPs: 31.80 | 7: iteration 25940/ 115203 | consumed samples: 6640640 | consumed tokens: 13600030720 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.373353E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.260 | TFLOPs: 32.18 | 7: iteration 25950/ 115203 | consumed samples: 6643200 | consumed tokens: 13605273600 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.395652E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.218 | TFLOPs: 31.70 | 7: iteration 25960/ 115203 | consumed samples: 6645760 | consumed tokens: 13610516480 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.376513E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.821 | TFLOPs: 32.05 | 7: iteration 25970/ 115203 | consumed samples: 6648320 | consumed tokens: 13615759360 | elapsed time per iteration (s): 0.42 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.394361E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.080 | TFLOPs: 31.70 | 7: iteration 25980/ 115203 | consumed samples: 6650880 | consumed tokens: 13621002240 | elapsed time per iteration (s): 0.44 | learning rate: 1.798E-04 | global batch size: 256 | lm loss: 2.366188E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.152 | TFLOPs: 30.49 | 7: iteration 25990/ 115203 | consumed samples: 6653440 | consumed tokens: 13626245120 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.399719E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.528 | TFLOPs: 31.98 | 0: [2022-11-28 16:04:59,245] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=0, lr=[0.00017972931879823854, 0.00017972931879823854, 0.00017972931879823854], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 26000/ 115203 | consumed samples: 6656000 | consumed tokens: 13631488000 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.406682E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.108 | TFLOPs: 31.38 | 0: steps: 26000 loss: 2.4382 iter time (s): 0.423 samples/sec: 605.466 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 26000 | lm loss value: 2.280793E+00 | lm loss PPL: 9.784441E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 26000 to checkpoints_221m 0: [2022-11-28 16:04:59,403] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step26000 is begin to save! 0: [2022-11-28 16:04:59,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:04:59,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:04:59,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:04:59,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:04:59,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:04:59,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:04:59,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:04:59,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:04:59,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:04:59,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:04:59,601] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:04:59,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:04:59,625] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:04:59,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:04:59,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:04:59,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:04:59,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:04:59,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:04:59,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:04:59,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:04:59,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:04:59,747] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:04:59,747] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:04:59,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:04:59,768] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:04:59,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:04:59,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:04:59,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:04:59,813] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:04:59,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:04:59,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:04:59,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:04:59,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:04:59,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:04:59,884] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:04:59,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:04:59,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:04:59,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:04:59,932] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:04:59,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:04:59,936] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step26000/mp_rank_00_model_states.pt 0: [2022-11-28 16:04:59,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:04:59,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:04:59,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:05:00,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 1: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 4: [2022-11-28 16:05:00,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:05:00,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:05:00,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 2: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:05:00,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:05:00,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 3: [2022-11-28 16:05:00,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:05:00,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:05:00,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 6: [2022-11-28 16:05:00,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:05:00,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:05:00,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:05:00,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 5: [2022-11-28 16:05:00,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:05:00,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step26000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: [2022-11-28 16:05:00,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step26000 is ready now! 0: successfully saved checkpoint at iteration 26000 to checkpoints_221m 7: time (ms) | save-checkpoint: 672.05 7: iteration 26010/ 115203 | consumed samples: 6658560 | consumed tokens: 13636730880 | elapsed time per iteration (s): 0.51 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.367702E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 502.328 | TFLOPs: 26.36 | 7: iteration 26020/ 115203 | consumed samples: 6661120 | consumed tokens: 13641973760 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.386695E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.117 | TFLOPs: 31.75 | 7: iteration 26030/ 115203 | consumed samples: 6663680 | consumed tokens: 13647216640 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.414669E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.773 | TFLOPs: 31.31 | 7: iteration 26040/ 115203 | consumed samples: 6666240 | consumed tokens: 13652459520 | elapsed time per iteration (s): 0.42 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.369884E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.049 | TFLOPs: 32.27 | 7: iteration 26050/ 115203 | consumed samples: 6668800 | consumed tokens: 13657702400 | elapsed time per iteration (s): 0.43 | learning rate: 1.797E-04 | global batch size: 256 | lm loss: 2.382771E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.569 | TFLOPs: 31.51 | 7: iteration 26060/ 115203 | consumed samples: 6671360 | consumed tokens: 13662945280 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.417198E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.723 | TFLOPs: 31.68 | 7: iteration 26070/ 115203 | consumed samples: 6673920 | consumed tokens: 13668188160 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.377768E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.274 | TFLOPs: 31.92 | 7: iteration 26080/ 115203 | consumed samples: 6676480 | consumed tokens: 13673431040 | elapsed time per iteration (s): 0.43 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.387446E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.508 | TFLOPs: 31.46 | 7: iteration 26090/ 115203 | consumed samples: 6679040 | consumed tokens: 13678673920 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.365629E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.813 | TFLOPs: 32.26 | 7: iteration 26100/ 115203 | consumed samples: 6681600 | consumed tokens: 13683916800 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.390375E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.053 | TFLOPs: 32.22 | 7: iteration 26110/ 115203 | consumed samples: 6684160 | consumed tokens: 13689159680 | elapsed time per iteration (s): 0.42 | learning rate: 1.796E-04 | global batch size: 256 | lm loss: 2.393925E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.708 | TFLOPs: 31.99 | 7: iteration 26120/ 115203 | consumed samples: 6686720 | consumed tokens: 13694402560 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.379894E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.934 | TFLOPs: 31.90 | 7: iteration 26130/ 115203 | consumed samples: 6689280 | consumed tokens: 13699645440 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.379022E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.485 | TFLOPs: 31.61 | 7: iteration 26140/ 115203 | consumed samples: 6691840 | consumed tokens: 13704888320 | elapsed time per iteration (s): 0.44 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.375893E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.256 | TFLOPs: 30.76 | 7: iteration 26150/ 115203 | consumed samples: 6694400 | consumed tokens: 13710131200 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.367019E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.472 | TFLOPs: 31.61 | 7: iteration 26160/ 115203 | consumed samples: 6696960 | consumed tokens: 13715374080 | elapsed time per iteration (s): 0.42 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.346282E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.001 | TFLOPs: 32.11 | 7: iteration 26170/ 115203 | consumed samples: 6699520 | consumed tokens: 13720616960 | elapsed time per iteration (s): 0.43 | learning rate: 1.795E-04 | global batch size: 256 | lm loss: 2.378786E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.384 | TFLOPs: 31.40 | 7: iteration 26180/ 115203 | consumed samples: 6702080 | consumed tokens: 13725859840 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.378022E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.159 | TFLOPs: 31.96 | 7: iteration 26190/ 115203 | consumed samples: 6704640 | consumed tokens: 13731102720 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.380425E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.406 | TFLOPs: 31.08 | 7: iteration 26200/ 115203 | consumed samples: 6707200 | consumed tokens: 13736345600 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.391224E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.555 | TFLOPs: 31.51 | 7: iteration 26210/ 115203 | consumed samples: 6709760 | consumed tokens: 13741588480 | elapsed time per iteration (s): 0.43 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.390075E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.828 | TFLOPs: 31.52 | 7: iteration 26220/ 115203 | consumed samples: 6712320 | consumed tokens: 13746831360 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.409592E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.028 | TFLOPs: 31.64 | 7: iteration 26230/ 115203 | consumed samples: 6714880 | consumed tokens: 13752074240 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.370611E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.172 | TFLOPs: 31.70 | 7: iteration 26240/ 115203 | consumed samples: 6717440 | consumed tokens: 13757317120 | elapsed time per iteration (s): 0.42 | learning rate: 1.794E-04 | global batch size: 256 | lm loss: 2.368929E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.155 | TFLOPs: 31.75 | 7: iteration 26250/ 115203 | consumed samples: 6720000 | consumed tokens: 13762560000 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.368542E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.640 | TFLOPs: 31.78 | 7: iteration 26260/ 115203 | consumed samples: 6722560 | consumed tokens: 13767802880 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.369486E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.658 | TFLOPs: 32.25 | 7: iteration 26270/ 115203 | consumed samples: 6725120 | consumed tokens: 13773045760 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.371466E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.520 | TFLOPs: 31.98 | 7: iteration 26280/ 115203 | consumed samples: 6727680 | consumed tokens: 13778288640 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.380501E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.074 | TFLOPs: 31.64 | 7: iteration 26290/ 115203 | consumed samples: 6730240 | consumed tokens: 13783531520 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.400507E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.678 | TFLOPs: 31.62 | 7: iteration 26300/ 115203 | consumed samples: 6732800 | consumed tokens: 13788774400 | elapsed time per iteration (s): 0.42 | learning rate: 1.793E-04 | global batch size: 256 | lm loss: 2.383053E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.825 | TFLOPs: 31.89 | 7: iteration 26310/ 115203 | consumed samples: 6735360 | consumed tokens: 13794017280 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.388962E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.515 | TFLOPs: 31.98 | 7: iteration 26320/ 115203 | consumed samples: 6737920 | consumed tokens: 13799260160 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.390860E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.404 | TFLOPs: 31.45 | 7: iteration 26330/ 115203 | consumed samples: 6740480 | consumed tokens: 13804503040 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.417583E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.478 | TFLOPs: 31.72 | 7: iteration 26340/ 115203 | consumed samples: 6743040 | consumed tokens: 13809745920 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.401575E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.926 | TFLOPs: 31.00 | 7: iteration 26350/ 115203 | consumed samples: 6745600 | consumed tokens: 13814988800 | elapsed time per iteration (s): 0.42 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.383005E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.954 | TFLOPs: 31.64 | 7: iteration 26360/ 115203 | consumed samples: 6748160 | consumed tokens: 13820231680 | elapsed time per iteration (s): 0.43 | learning rate: 1.792E-04 | global batch size: 256 | lm loss: 2.348078E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.127 | TFLOPs: 31.17 | 7: iteration 26370/ 115203 | consumed samples: 6750720 | consumed tokens: 13825474560 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.390470E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.206 | TFLOPs: 32.07 | 7: iteration 26380/ 115203 | consumed samples: 6753280 | consumed tokens: 13830717440 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.398239E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.581 | TFLOPs: 31.14 | 7: iteration 26390/ 115203 | consumed samples: 6755840 | consumed tokens: 13835960320 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.410176E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.266 | TFLOPs: 31.49 | 7: iteration 26400/ 115203 | consumed samples: 6758400 | consumed tokens: 13841203200 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.370710E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.358 | TFLOPs: 31.60 | 7: iteration 26410/ 115203 | consumed samples: 6760960 | consumed tokens: 13846446080 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.407806E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.749 | TFLOPs: 32.05 | 7: iteration 26420/ 115203 | consumed samples: 6763520 | consumed tokens: 13851688960 | elapsed time per iteration (s): 0.42 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.367958E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.117 | TFLOPs: 31.64 | 7: iteration 26430/ 115203 | consumed samples: 6766080 | consumed tokens: 13856931840 | elapsed time per iteration (s): 0.43 | learning rate: 1.791E-04 | global batch size: 256 | lm loss: 2.334939E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.806 | TFLOPs: 31.37 | 7: iteration 26440/ 115203 | consumed samples: 6768640 | consumed tokens: 13862174720 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.439292E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.690 | TFLOPs: 31.94 | 7: iteration 26450/ 115203 | consumed samples: 6771200 | consumed tokens: 13867417600 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.372315E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.908 | TFLOPs: 31.95 | 7: iteration 26460/ 115203 | consumed samples: 6773760 | consumed tokens: 13872660480 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.384660E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.272 | TFLOPs: 31.39 | 7: iteration 26470/ 115203 | consumed samples: 6776320 | consumed tokens: 13877903360 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.366146E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.521 | TFLOPs: 31.56 | 7: iteration 26480/ 115203 | consumed samples: 6778880 | consumed tokens: 13883146240 | elapsed time per iteration (s): 0.42 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.372983E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.358 | TFLOPs: 32.02 | 7: iteration 26490/ 115203 | consumed samples: 6781440 | consumed tokens: 13888389120 | elapsed time per iteration (s): 0.43 | learning rate: 1.790E-04 | global batch size: 256 | lm loss: 2.366255E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.223 | TFLOPs: 30.97 | 7: iteration 26500/ 115203 | consumed samples: 6784000 | consumed tokens: 13893632000 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.383544E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.168 | TFLOPs: 31.96 | 7: iteration 26510/ 115203 | consumed samples: 6786560 | consumed tokens: 13898874880 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.390686E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.962 | TFLOPs: 32.11 | 7: iteration 26520/ 115203 | consumed samples: 6789120 | consumed tokens: 13904117760 | elapsed time per iteration (s): 0.52 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.385969E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 497.017 | TFLOPs: 26.08 | 7: iteration 26530/ 115203 | consumed samples: 6791680 | consumed tokens: 13909360640 | elapsed time per iteration (s): 0.42 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.384121E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.213 | TFLOPs: 31.81 | 7: iteration 26540/ 115203 | consumed samples: 6794240 | consumed tokens: 13914603520 | elapsed time per iteration (s): 0.43 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.365587E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.835 | TFLOPs: 31.16 | 7: iteration 26550/ 115203 | consumed samples: 6796800 | consumed tokens: 13919846400 | elapsed time per iteration (s): 0.56 | learning rate: 1.789E-04 | global batch size: 256 | lm loss: 2.400163E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.107 | TFLOPs: 23.77 | 7: iteration 26560/ 115203 | consumed samples: 6799360 | consumed tokens: 13925089280 | elapsed time per iteration (s): 0.42 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.380975E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.078 | TFLOPs: 31.80 | 7: iteration 26570/ 115203 | consumed samples: 6801920 | consumed tokens: 13930332160 | elapsed time per iteration (s): 0.84 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.345480E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 303.800 | TFLOPs: 15.94 | 7: iteration 26580/ 115203 | consumed samples: 6804480 | consumed tokens: 13935575040 | elapsed time per iteration (s): 0.54 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.389522E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 471.155 | TFLOPs: 24.72 | 7: iteration 26590/ 115203 | consumed samples: 6807040 | consumed tokens: 13940817920 | elapsed time per iteration (s): 0.42 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.363952E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.317 | TFLOPs: 31.66 | 7: iteration 26600/ 115203 | consumed samples: 6809600 | consumed tokens: 13946060800 | elapsed time per iteration (s): 0.46 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.381900E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.997 | TFLOPs: 29.12 | 7: iteration 26610/ 115203 | consumed samples: 6812160 | consumed tokens: 13951303680 | elapsed time per iteration (s): 0.42 | learning rate: 1.788E-04 | global batch size: 256 | lm loss: 2.377584E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.940 | TFLOPs: 31.90 | 7: iteration 26620/ 115203 | consumed samples: 6814720 | consumed tokens: 13956546560 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.382969E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.941 | TFLOPs: 31.58 | 7: iteration 26630/ 115203 | consumed samples: 6817280 | consumed tokens: 13961789440 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.380074E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.959 | TFLOPs: 31.37 | 7: iteration 26640/ 115203 | consumed samples: 6819840 | consumed tokens: 13967032320 | elapsed time per iteration (s): 0.44 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.392166E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.259 | TFLOPs: 30.76 | 7: iteration 26650/ 115203 | consumed samples: 6822400 | consumed tokens: 13972275200 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.367269E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.278 | TFLOPs: 30.97 | 7: iteration 26660/ 115203 | consumed samples: 6824960 | consumed tokens: 13977518080 | elapsed time per iteration (s): 0.42 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.359709E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.280 | TFLOPs: 32.07 | 7: iteration 26670/ 115203 | consumed samples: 6827520 | consumed tokens: 13982760960 | elapsed time per iteration (s): 0.44 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.399139E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.544 | TFLOPs: 30.30 | 7: iteration 26680/ 115203 | consumed samples: 6830080 | consumed tokens: 13988003840 | elapsed time per iteration (s): 0.43 | learning rate: 1.787E-04 | global batch size: 256 | lm loss: 2.412231E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.260 | TFLOPs: 31.07 | 7: iteration 26690/ 115203 | consumed samples: 6832640 | consumed tokens: 13993246720 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.353950E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.330 | TFLOPs: 31.39 | 7: iteration 26700/ 115203 | consumed samples: 6835200 | consumed tokens: 13998489600 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.393772E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.835 | TFLOPs: 30.90 | 7: iteration 26710/ 115203 | consumed samples: 6837760 | consumed tokens: 14003732480 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.364969E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.324 | TFLOPs: 31.97 | 7: iteration 26720/ 115203 | consumed samples: 6840320 | consumed tokens: 14008975360 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.387707E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.692 | TFLOPs: 31.83 | 7: iteration 26730/ 115203 | consumed samples: 6842880 | consumed tokens: 14014218240 | elapsed time per iteration (s): 0.42 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.379671E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.854 | TFLOPs: 31.68 | 7: iteration 26740/ 115203 | consumed samples: 6845440 | consumed tokens: 14019461120 | elapsed time per iteration (s): 0.43 | learning rate: 1.786E-04 | global batch size: 256 | lm loss: 2.368595E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.746 | TFLOPs: 31.42 | 7: iteration 26750/ 115203 | consumed samples: 6848000 | consumed tokens: 14024704000 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.389231E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.325 | TFLOPs: 31.55 | 7: iteration 26760/ 115203 | consumed samples: 6850560 | consumed tokens: 14029946880 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.372544E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.384 | TFLOPs: 31.82 | 7: iteration 26770/ 115203 | consumed samples: 6853120 | consumed tokens: 14035189760 | elapsed time per iteration (s): 0.42 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.398472E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.570 | TFLOPs: 31.77 | 7: iteration 26780/ 115203 | consumed samples: 6855680 | consumed tokens: 14040432640 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.391735E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.845 | TFLOPs: 31.05 | 7: iteration 26790/ 115203 | consumed samples: 6858240 | consumed tokens: 14045675520 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.357285E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.791 | TFLOPs: 31.26 | 7: iteration 26800/ 115203 | consumed samples: 6860800 | consumed tokens: 14050918400 | elapsed time per iteration (s): 0.43 | learning rate: 1.785E-04 | global batch size: 256 | lm loss: 2.380782E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.569 | TFLOPs: 31.35 | 7: iteration 26810/ 115203 | consumed samples: 6863360 | consumed tokens: 14056161280 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.352029E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.179 | TFLOPs: 31.28 | 7: iteration 26820/ 115203 | consumed samples: 6865920 | consumed tokens: 14061404160 | elapsed time per iteration (s): 0.44 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.372821E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.251 | TFLOPs: 30.65 | 7: iteration 26830/ 115203 | consumed samples: 6868480 | consumed tokens: 14066647040 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.342795E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.295 | TFLOPs: 31.08 | 7: iteration 26840/ 115203 | consumed samples: 6871040 | consumed tokens: 14071889920 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.366814E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.966 | TFLOPs: 30.95 | 7: iteration 26850/ 115203 | consumed samples: 6873600 | consumed tokens: 14077132800 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.383247E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.470 | TFLOPs: 30.98 | 7: iteration 26860/ 115203 | consumed samples: 6876160 | consumed tokens: 14082375680 | elapsed time per iteration (s): 0.43 | learning rate: 1.784E-04 | global batch size: 256 | lm loss: 2.368109E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.593 | TFLOPs: 31.14 | 7: iteration 26870/ 115203 | consumed samples: 6878720 | consumed tokens: 14087618560 | elapsed time per iteration (s): 0.42 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.373734E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.983 | TFLOPs: 31.69 | 7: iteration 26880/ 115203 | consumed samples: 6881280 | consumed tokens: 14092861440 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.376204E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.061 | TFLOPs: 31.27 | 7: iteration 26890/ 115203 | consumed samples: 6883840 | consumed tokens: 14098104320 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.365457E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.523 | TFLOPs: 30.98 | 7: iteration 26900/ 115203 | consumed samples: 6886400 | consumed tokens: 14103347200 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.393313E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | 7: iteration 26910/ 115203 | consumed samples: 6888960 | consumed tokens: 14108590080 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.347777E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.758 | TFLOPs: 31.42 | 7: iteration 26920/ 115203 | consumed samples: 6891520 | consumed tokens: 14113832960 | elapsed time per iteration (s): 0.43 | learning rate: 1.783E-04 | global batch size: 256 | lm loss: 2.385441E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.250 | TFLOPs: 31.07 | 7: iteration 26930/ 115203 | consumed samples: 6894080 | consumed tokens: 14119075840 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.383392E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.516 | TFLOPs: 31.98 | 7: iteration 26940/ 115203 | consumed samples: 6896640 | consumed tokens: 14124318720 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.371171E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.584 | TFLOPs: 31.30 | 7: iteration 26950/ 115203 | consumed samples: 6899200 | consumed tokens: 14129561600 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.378038E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.441 | TFLOPs: 31.61 | 7: iteration 26960/ 115203 | consumed samples: 6901760 | consumed tokens: 14134804480 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.368410E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.086 | TFLOPs: 31.12 | 7: iteration 26970/ 115203 | consumed samples: 6904320 | consumed tokens: 14140047360 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.358488E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.551 | TFLOPs: 31.41 | 7: iteration 26980/ 115203 | consumed samples: 6906880 | consumed tokens: 14145290240 | elapsed time per iteration (s): 0.43 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.380432E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.858 | TFLOPs: 31.37 | 7: iteration 26990/ 115203 | consumed samples: 6909440 | consumed tokens: 14150533120 | elapsed time per iteration (s): 0.42 | learning rate: 1.782E-04 | global batch size: 256 | lm loss: 2.397132E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.591 | TFLOPs: 31.62 | 7: iteration 27000/ 115203 | consumed samples: 6912000 | consumed tokens: 14155776000 | elapsed time per iteration (s): 0.44 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.381591E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.152 | TFLOPs: 30.60 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 27000 | lm loss value: 2.332742E+00 | lm loss PPL: 1.030616E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 27000 to checkpoints_221m 0: [2022-11-28 16:12:14,359] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step27000 is begin to save! 0: [2022-11-28 16:12:14,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:12:14,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:12:14,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:12:14,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:12:14,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:12:14,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:12:14,566] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:12:14,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:12:14,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:12:14,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:12:14,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:12:14,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:12:14,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:12:14,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:12:14,666] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:12:14,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:12:14,692] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:12:14,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:12:14,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:12:14,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:12:14,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:12:14,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:12:14,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:12:14,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:12:14,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:12:14,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:12:14,816] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:12:14,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:12:14,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:12:14,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:12:14,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:12:14,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:12:14,894] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:12:14,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:12:14,919] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:12:14,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:12:14,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:12:14,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:12:14,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:12:14,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:12:14,974] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step27000/mp_rank_00_model_states.pt 0: [2022-11-28 16:12:14,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:12:14,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:14,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:12:15,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 2: [2022-11-28 16:12:15,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,055] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 4: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,058] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,058] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:12:15,058] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:12:15,058] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:12:15,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:12:15,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 6: [2022-11-28 16:12:15,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 5: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,066] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,066] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,066] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,066] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 3: [2022-11-28 16:12:15,066] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 7: [2022-11-28 16:12:15,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:12:15,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:12:15,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: [2022-11-28 16:12:15,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step27000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:12:15,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step27000 is ready now! 0: successfully saved checkpoint at iteration 27000 to checkpoints_221m 7: time (ms) | save-checkpoint: 802.44 7: iteration 27010/ 115203 | consumed samples: 6914560 | consumed tokens: 14161018880 | elapsed time per iteration (s): 0.52 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.384433E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 488.300 | TFLOPs: 25.62 | 7: iteration 27020/ 115203 | consumed samples: 6917120 | consumed tokens: 14166261760 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.351369E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.576 | TFLOPs: 31.20 | 7: iteration 27030/ 115203 | consumed samples: 6919680 | consumed tokens: 14171504640 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.357752E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.325 | TFLOPs: 31.03 | 7: iteration 27040/ 115203 | consumed samples: 6922240 | consumed tokens: 14176747520 | elapsed time per iteration (s): 0.45 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.362523E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.676 | TFLOPs: 30.10 | 7: iteration 27050/ 115203 | consumed samples: 6924800 | consumed tokens: 14181990400 | elapsed time per iteration (s): 0.43 | learning rate: 1.781E-04 | global batch size: 256 | lm loss: 2.412514E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.382 | TFLOPs: 30.92 | 7: iteration 27060/ 115203 | consumed samples: 6927360 | consumed tokens: 14187233280 | elapsed time per iteration (s): 0.43 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.396283E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.818 | TFLOPs: 30.95 | 7: iteration 27070/ 115203 | consumed samples: 6929920 | consumed tokens: 14192476160 | elapsed time per iteration (s): 0.44 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.373804E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.029 | TFLOPs: 30.43 | 7: iteration 27080/ 115203 | consumed samples: 6932480 | consumed tokens: 14197719040 | elapsed time per iteration (s): 0.42 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.364974E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.421 | TFLOPs: 31.98 | 7: iteration 27090/ 115203 | consumed samples: 6935040 | consumed tokens: 14202961920 | elapsed time per iteration (s): 0.43 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.378013E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.231 | TFLOPs: 31.44 | 7: iteration 27100/ 115203 | consumed samples: 6937600 | consumed tokens: 14208204800 | elapsed time per iteration (s): 0.43 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.363337E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.301 | TFLOPs: 31.18 | 7: iteration 27110/ 115203 | consumed samples: 6940160 | consumed tokens: 14213447680 | elapsed time per iteration (s): 0.43 | learning rate: 1.780E-04 | global batch size: 256 | lm loss: 2.395095E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.006 | TFLOPs: 31.38 | 7: iteration 27120/ 115203 | consumed samples: 6942720 | consumed tokens: 14218690560 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.365139E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.523 | TFLOPs: 31.30 | 7: iteration 27130/ 115203 | consumed samples: 6945280 | consumed tokens: 14223933440 | elapsed time per iteration (s): 0.44 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.388643E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.368 | TFLOPs: 30.71 | 7: iteration 27140/ 115203 | consumed samples: 6947840 | consumed tokens: 14229176320 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.342171E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.630 | TFLOPs: 30.99 | 7: iteration 27150/ 115203 | consumed samples: 6950400 | consumed tokens: 14234419200 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.371886E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.132 | TFLOPs: 31.80 | 7: iteration 27160/ 115203 | consumed samples: 6952960 | consumed tokens: 14239662080 | elapsed time per iteration (s): 0.42 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.399321E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.690 | TFLOPs: 31.73 | 7: iteration 27170/ 115203 | consumed samples: 6955520 | consumed tokens: 14244904960 | elapsed time per iteration (s): 0.43 | learning rate: 1.779E-04 | global batch size: 256 | lm loss: 2.370350E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.696 | TFLOPs: 31.52 | 7: iteration 27180/ 115203 | consumed samples: 6958080 | consumed tokens: 14250147840 | elapsed time per iteration (s): 0.44 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.369947E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.256 | TFLOPs: 30.86 | 7: iteration 27190/ 115203 | consumed samples: 6960640 | consumed tokens: 14255390720 | elapsed time per iteration (s): 0.43 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.382098E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.923 | TFLOPs: 31.11 | 7: iteration 27200/ 115203 | consumed samples: 6963200 | consumed tokens: 14260633600 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.381719E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.804 | TFLOPs: 31.68 | 7: iteration 27210/ 115203 | consumed samples: 6965760 | consumed tokens: 14265876480 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.380834E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.659 | TFLOPs: 31.73 | 7: iteration 27220/ 115203 | consumed samples: 6968320 | consumed tokens: 14271119360 | elapsed time per iteration (s): 0.42 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.377714E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.058 | TFLOPs: 31.75 | 7: iteration 27230/ 115203 | consumed samples: 6970880 | consumed tokens: 14276362240 | elapsed time per iteration (s): 0.44 | learning rate: 1.778E-04 | global batch size: 256 | lm loss: 2.380760E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.805 | TFLOPs: 30.79 | 7: iteration 27240/ 115203 | consumed samples: 6973440 | consumed tokens: 14281605120 | elapsed time per iteration (s): 0.43 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.354198E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.780 | TFLOPs: 30.94 | 7: iteration 27250/ 115203 | consumed samples: 6976000 | consumed tokens: 14286848000 | elapsed time per iteration (s): 0.42 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.372373E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.320 | TFLOPs: 31.76 | 7: iteration 27260/ 115203 | consumed samples: 6978560 | consumed tokens: 14292090880 | elapsed time per iteration (s): 0.44 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.362136E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.735 | TFLOPs: 30.47 | 7: iteration 27270/ 115203 | consumed samples: 6981120 | consumed tokens: 14297333760 | elapsed time per iteration (s): 0.44 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.356633E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.254 | TFLOPs: 30.71 | 7: iteration 27280/ 115203 | consumed samples: 6983680 | consumed tokens: 14302576640 | elapsed time per iteration (s): 0.44 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.378434E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.456 | TFLOPs: 30.56 | 7: iteration 27290/ 115203 | consumed samples: 6986240 | consumed tokens: 14307819520 | elapsed time per iteration (s): 0.43 | learning rate: 1.777E-04 | global batch size: 256 | lm loss: 2.381388E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.945 | TFLOPs: 31.11 | 7: iteration 27300/ 115203 | consumed samples: 6988800 | consumed tokens: 14313062400 | elapsed time per iteration (s): 0.44 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.385253E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.985 | TFLOPs: 30.80 | 7: iteration 27310/ 115203 | consumed samples: 6991360 | consumed tokens: 14318305280 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.384864E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.677 | TFLOPs: 31.73 | 7: iteration 27320/ 115203 | consumed samples: 6993920 | consumed tokens: 14323548160 | elapsed time per iteration (s): 0.43 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.384988E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.797 | TFLOPs: 31.16 | 7: iteration 27330/ 115203 | consumed samples: 6996480 | consumed tokens: 14328791040 | elapsed time per iteration (s): 0.43 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.374746E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.301 | TFLOPs: 31.55 | 7: iteration 27340/ 115203 | consumed samples: 6999040 | consumed tokens: 14334033920 | elapsed time per iteration (s): 0.42 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.370798E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.817 | TFLOPs: 32.05 | 7: iteration 27350/ 115203 | consumed samples: 7001600 | consumed tokens: 14339276800 | elapsed time per iteration (s): 0.45 | learning rate: 1.776E-04 | global batch size: 256 | lm loss: 2.367034E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.233 | TFLOPs: 29.71 | 7: iteration 27360/ 115203 | consumed samples: 7004160 | consumed tokens: 14344519680 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.377961E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.862 | TFLOPs: 31.05 | 7: iteration 27370/ 115203 | consumed samples: 7006720 | consumed tokens: 14349762560 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.421731E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.855 | TFLOPs: 32.00 | 7: iteration 27380/ 115203 | consumed samples: 7009280 | consumed tokens: 14355005440 | elapsed time per iteration (s): 0.44 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.354930E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.263 | TFLOPs: 30.66 | 7: iteration 27390/ 115203 | consumed samples: 7011840 | consumed tokens: 14360248320 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.380097E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.623 | TFLOPs: 31.30 | 7: iteration 27400/ 115203 | consumed samples: 7014400 | consumed tokens: 14365491200 | elapsed time per iteration (s): 0.42 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.358853E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.119 | TFLOPs: 31.75 | 7: iteration 27410/ 115203 | consumed samples: 7016960 | consumed tokens: 14370734080 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.369214E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.470 | TFLOPs: 31.51 | 7: iteration 27420/ 115203 | consumed samples: 7019520 | consumed tokens: 14375976960 | elapsed time per iteration (s): 0.43 | learning rate: 1.775E-04 | global batch size: 256 | lm loss: 2.389955E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.037 | TFLOPs: 31.54 | 7: iteration 27430/ 115203 | consumed samples: 7022080 | consumed tokens: 14381219840 | elapsed time per iteration (s): 0.43 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.386413E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.393 | TFLOPs: 31.45 | 7: iteration 27440/ 115203 | consumed samples: 7024640 | consumed tokens: 14386462720 | elapsed time per iteration (s): 0.44 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.382568E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.047 | TFLOPs: 30.80 | 7: iteration 27450/ 115203 | consumed samples: 7027200 | consumed tokens: 14391705600 | elapsed time per iteration (s): 0.43 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.371863E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.175 | TFLOPs: 31.44 | 7: iteration 27460/ 115203 | consumed samples: 7029760 | consumed tokens: 14396948480 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.398952E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.070 | TFLOPs: 31.69 | 7: iteration 27470/ 115203 | consumed samples: 7032320 | consumed tokens: 14402191360 | elapsed time per iteration (s): 0.42 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.371018E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.530 | TFLOPs: 31.72 | 7: iteration 27480/ 115203 | consumed samples: 7034880 | consumed tokens: 14407434240 | elapsed time per iteration (s): 0.43 | learning rate: 1.774E-04 | global batch size: 256 | lm loss: 2.372549E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.656 | TFLOPs: 30.99 | 7: iteration 27490/ 115203 | consumed samples: 7037440 | consumed tokens: 14412677120 | elapsed time per iteration (s): 0.42 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.389129E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.112 | TFLOPs: 31.64 | 7: iteration 27500/ 115203 | consumed samples: 7040000 | consumed tokens: 14417920000 | elapsed time per iteration (s): 0.43 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.392193E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.779 | TFLOPs: 31.42 | 7: iteration 27510/ 115203 | consumed samples: 7042560 | consumed tokens: 14423162880 | elapsed time per iteration (s): 0.45 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.396380E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.994 | TFLOPs: 29.75 | 7: iteration 27520/ 115203 | consumed samples: 7045120 | consumed tokens: 14428405760 | elapsed time per iteration (s): 0.45 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.358662E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.038 | TFLOPs: 29.65 | 7: iteration 27530/ 115203 | consumed samples: 7047680 | consumed tokens: 14433648640 | elapsed time per iteration (s): 0.43 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.383336E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.913 | TFLOPs: 31.42 | 7: iteration 27540/ 115203 | consumed samples: 7050240 | consumed tokens: 14438891520 | elapsed time per iteration (s): 0.43 | learning rate: 1.773E-04 | global batch size: 256 | lm loss: 2.359838E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.160 | TFLOPs: 31.12 | 7: iteration 27550/ 115203 | consumed samples: 7052800 | consumed tokens: 14444134400 | elapsed time per iteration (s): 0.44 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.363345E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.387 | TFLOPs: 30.35 | 7: iteration 27560/ 115203 | consumed samples: 7055360 | consumed tokens: 14449377280 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.382697E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.665 | TFLOPs: 30.99 | 7: iteration 27570/ 115203 | consumed samples: 7057920 | consumed tokens: 14454620160 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.364371E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.133 | TFLOPs: 31.38 | 7: iteration 27580/ 115203 | consumed samples: 7060480 | consumed tokens: 14459863040 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.377122E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.896 | TFLOPs: 31.21 | 7: iteration 27590/ 115203 | consumed samples: 7063040 | consumed tokens: 14465105920 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.379633E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.751 | TFLOPs: 31.21 | 7: iteration 27600/ 115203 | consumed samples: 7065600 | consumed tokens: 14470348800 | elapsed time per iteration (s): 0.43 | learning rate: 1.772E-04 | global batch size: 256 | lm loss: 2.391268E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.743 | TFLOPs: 31.21 | 7: iteration 27610/ 115203 | consumed samples: 7068160 | consumed tokens: 14475591680 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.371794E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.260 | TFLOPs: 31.91 | 7: iteration 27620/ 115203 | consumed samples: 7070720 | consumed tokens: 14480834560 | elapsed time per iteration (s): 0.43 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.373693E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.499 | TFLOPs: 31.56 | 7: iteration 27630/ 115203 | consumed samples: 7073280 | consumed tokens: 14486077440 | elapsed time per iteration (s): 0.43 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.386612E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.470 | TFLOPs: 31.09 | 7: iteration 27640/ 115203 | consumed samples: 7075840 | consumed tokens: 14491320320 | elapsed time per iteration (s): 0.42 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.384711E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.936 | TFLOPs: 31.74 | 7: iteration 27650/ 115203 | consumed samples: 7078400 | consumed tokens: 14496563200 | elapsed time per iteration (s): 0.43 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.380136E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.210 | TFLOPs: 31.12 | 7: iteration 27660/ 115203 | consumed samples: 7080960 | consumed tokens: 14501806080 | elapsed time per iteration (s): 0.44 | learning rate: 1.771E-04 | global batch size: 256 | lm loss: 2.367508E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.244 | TFLOPs: 30.86 | 7: iteration 27670/ 115203 | consumed samples: 7083520 | consumed tokens: 14507048960 | elapsed time per iteration (s): 0.43 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.389175E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.031 | TFLOPs: 31.33 | 7: iteration 27680/ 115203 | consumed samples: 7086080 | consumed tokens: 14512291840 | elapsed time per iteration (s): 0.43 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.388035E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.499 | TFLOPs: 31.14 | 7: iteration 27690/ 115203 | consumed samples: 7088640 | consumed tokens: 14517534720 | elapsed time per iteration (s): 0.42 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.400713E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.238 | TFLOPs: 31.65 | 7: iteration 27700/ 115203 | consumed samples: 7091200 | consumed tokens: 14522777600 | elapsed time per iteration (s): 0.43 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.363972E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.956 | TFLOPs: 31.32 | 7: iteration 27710/ 115203 | consumed samples: 7093760 | consumed tokens: 14528020480 | elapsed time per iteration (s): 0.43 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.408432E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.801 | TFLOPs: 31.42 | 7: iteration 27720/ 115203 | consumed samples: 7096320 | consumed tokens: 14533263360 | elapsed time per iteration (s): 0.43 | learning rate: 1.770E-04 | global batch size: 256 | lm loss: 2.394699E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.617 | TFLOPs: 31.57 | 7: iteration 27730/ 115203 | consumed samples: 7098880 | consumed tokens: 14538506240 | elapsed time per iteration (s): 0.44 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.379010E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.035 | TFLOPs: 30.80 | 7: iteration 27740/ 115203 | consumed samples: 7101440 | consumed tokens: 14543749120 | elapsed time per iteration (s): 0.43 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.381629E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.911 | TFLOPs: 31.53 | 7: iteration 27750/ 115203 | consumed samples: 7104000 | consumed tokens: 14548992000 | elapsed time per iteration (s): 0.43 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.397808E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.478 | TFLOPs: 31.35 | 7: iteration 27760/ 115203 | consumed samples: 7106560 | consumed tokens: 14554234880 | elapsed time per iteration (s): 0.43 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.383187E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.868 | TFLOPs: 31.21 | 7: iteration 27770/ 115203 | consumed samples: 7109120 | consumed tokens: 14559477760 | elapsed time per iteration (s): 0.43 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.355070E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.958 | TFLOPs: 31.48 | 7: iteration 27780/ 115203 | consumed samples: 7111680 | consumed tokens: 14564720640 | elapsed time per iteration (s): 0.60 | learning rate: 1.769E-04 | global batch size: 256 | lm loss: 2.409266E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 429.155 | TFLOPs: 22.52 | 7: iteration 27790/ 115203 | consumed samples: 7114240 | consumed tokens: 14569963520 | elapsed time per iteration (s): 0.44 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.366649E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.495 | TFLOPs: 30.88 | 7: iteration 27800/ 115203 | consumed samples: 7116800 | consumed tokens: 14575206400 | elapsed time per iteration (s): 0.43 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.367966E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.315 | TFLOPs: 30.92 | 7: iteration 27810/ 115203 | consumed samples: 7119360 | consumed tokens: 14580449280 | elapsed time per iteration (s): 0.42 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.359280E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.123 | TFLOPs: 31.64 | 7: iteration 27820/ 115203 | consumed samples: 7121920 | consumed tokens: 14585692160 | elapsed time per iteration (s): 0.43 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.364123E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.023 | TFLOPs: 31.48 | 7: iteration 27830/ 115203 | consumed samples: 7124480 | consumed tokens: 14590935040 | elapsed time per iteration (s): 0.44 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.387033E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.413 | TFLOPs: 30.87 | 7: iteration 27840/ 115203 | consumed samples: 7127040 | consumed tokens: 14596177920 | elapsed time per iteration (s): 0.43 | learning rate: 1.768E-04 | global batch size: 256 | lm loss: 2.393716E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.110 | TFLOPs: 31.49 | 7: iteration 27850/ 115203 | consumed samples: 7129600 | consumed tokens: 14601420800 | elapsed time per iteration (s): 0.43 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.396977E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.581 | TFLOPs: 31.41 | 7: iteration 27860/ 115203 | consumed samples: 7132160 | consumed tokens: 14606663680 | elapsed time per iteration (s): 0.43 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.398536E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.590 | TFLOPs: 31.20 | 7: iteration 27870/ 115203 | consumed samples: 7134720 | consumed tokens: 14611906560 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.355441E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.784 | TFLOPs: 31.78 | 7: iteration 27880/ 115203 | consumed samples: 7137280 | consumed tokens: 14617149440 | elapsed time per iteration (s): 0.44 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.405890E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.638 | TFLOPs: 30.73 | 7: iteration 27890/ 115203 | consumed samples: 7139840 | consumed tokens: 14622392320 | elapsed time per iteration (s): 0.43 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.355271E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.410 | TFLOPs: 31.35 | 7: iteration 27900/ 115203 | consumed samples: 7142400 | consumed tokens: 14627635200 | elapsed time per iteration (s): 0.42 | learning rate: 1.767E-04 | global batch size: 256 | lm loss: 2.330920E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.027 | TFLOPs: 31.90 | 7: iteration 27910/ 115203 | consumed samples: 7144960 | consumed tokens: 14632878080 | elapsed time per iteration (s): 0.42 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.370406E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.519 | TFLOPs: 31.82 | 7: iteration 27920/ 115203 | consumed samples: 7147520 | consumed tokens: 14638120960 | elapsed time per iteration (s): 0.43 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.352768E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.119 | TFLOPs: 31.28 | 7: iteration 27930/ 115203 | consumed samples: 7150080 | consumed tokens: 14643363840 | elapsed time per iteration (s): 0.43 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.366554E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.020 | TFLOPs: 31.11 | 7: iteration 27940/ 115203 | consumed samples: 7152640 | consumed tokens: 14648606720 | elapsed time per iteration (s): 0.43 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.420407E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.034 | TFLOPs: 31.38 | 7: iteration 27950/ 115203 | consumed samples: 7155200 | consumed tokens: 14653849600 | elapsed time per iteration (s): 0.43 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.379555E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.341 | TFLOPs: 31.18 | 7: iteration 27960/ 115203 | consumed samples: 7157760 | consumed tokens: 14659092480 | elapsed time per iteration (s): 0.44 | learning rate: 1.766E-04 | global batch size: 256 | lm loss: 2.368727E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.556 | TFLOPs: 30.67 | 7: iteration 27970/ 115203 | consumed samples: 7160320 | consumed tokens: 14664335360 | elapsed time per iteration (s): 0.43 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.373091E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.228 | TFLOPs: 31.28 | 7: iteration 27980/ 115203 | consumed samples: 7162880 | consumed tokens: 14669578240 | elapsed time per iteration (s): 0.43 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.384924E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.964 | TFLOPs: 31.37 | 7: iteration 27990/ 115203 | consumed samples: 7165440 | consumed tokens: 14674821120 | elapsed time per iteration (s): 0.43 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.371302E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.665 | TFLOPs: 31.36 | 0: [2022-11-28 16:19:27,170] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=0, lr=[0.00017649035869598463, 0.00017649035869598463, 0.00017649035869598463], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 28000/ 115203 | consumed samples: 7168000 | consumed tokens: 14680064000 | elapsed time per iteration (s): 0.44 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.372468E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.806 | TFLOPs: 30.42 | 0: steps: 28000 loss: 2.4242 iter time (s): 0.431 samples/sec: 593.564 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 28000 | lm loss value: 2.365985E+00 | lm loss PPL: 1.065453E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 28000 to checkpoints_221m 0: [2022-11-28 16:19:27,356] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step28000 is begin to save! 0: [2022-11-28 16:19:27,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:19:27,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:19:27,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:19:27,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:19:27,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:19:27,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:19:27,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:19:27,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:19:27,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:19:27,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:19:27,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:19:27,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:19:27,633] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:19:27,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:19:27,657] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:19:27,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:19:27,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:19:27,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:19:27,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:19:27,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:19:27,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:19:27,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:19:27,756] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:19:27,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:19:27,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:19:27,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:19:27,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:19:27,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:19:27,829] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:19:27,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:19:27,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:19:27,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:19:27,876] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:19:27,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:19:27,901] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:19:27,926] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:19:27,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:19:27,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:19:27,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:19:27,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:19:27,956] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step28000/mp_rank_00_model_states.pt 0: [2022-11-28 16:19:27,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:19:27,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:27,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:19:28,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 2: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:19:28,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 6: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:19:28,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 7: [2022-11-28 16:19:28,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:19:28,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:19:28,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 5: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:19:28,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:19:28,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:19:28,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 4: [2022-11-28 16:19:28,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 3: [2022-11-28 16:19:28,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:19:28,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,063] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,063] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,063] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,063] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: [2022-11-28 16:19:28,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:19:28,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:19:28,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step28000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 1: [2022-11-28 16:19:28,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step28000 is ready now! 0: successfully saved checkpoint at iteration 28000 to checkpoints_221m 7: time (ms) | save-checkpoint: 816.47 7: iteration 28010/ 115203 | consumed samples: 7170560 | consumed tokens: 14685306880 | elapsed time per iteration (s): 0.53 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.396791E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.007 | TFLOPs: 25.34 | 7: iteration 28020/ 115203 | consumed samples: 7173120 | consumed tokens: 14690549760 | elapsed time per iteration (s): 0.43 | learning rate: 1.765E-04 | global batch size: 256 | lm loss: 2.339605E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.724 | TFLOPs: 31.31 | 7: iteration 28030/ 115203 | consumed samples: 7175680 | consumed tokens: 14695792640 | elapsed time per iteration (s): 0.43 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.385484E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.780 | TFLOPs: 31.31 | 7: iteration 28040/ 115203 | consumed samples: 7178240 | consumed tokens: 14701035520 | elapsed time per iteration (s): 0.43 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.380104E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.212 | TFLOPs: 30.92 | 7: iteration 28050/ 115203 | consumed samples: 7180800 | consumed tokens: 14706278400 | elapsed time per iteration (s): 0.43 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.350263E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.330 | TFLOPs: 31.03 | 7: iteration 28060/ 115203 | consumed samples: 7183360 | consumed tokens: 14711521280 | elapsed time per iteration (s): 0.42 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.364268E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.331 | TFLOPs: 31.66 | 7: iteration 28070/ 115203 | consumed samples: 7185920 | consumed tokens: 14716764160 | elapsed time per iteration (s): 0.43 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.405817E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.693 | TFLOPs: 31.05 | 7: iteration 28080/ 115203 | consumed samples: 7188480 | consumed tokens: 14722007040 | elapsed time per iteration (s): 0.43 | learning rate: 1.764E-04 | global batch size: 256 | lm loss: 2.369145E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.048 | TFLOPs: 31.33 | 7: iteration 28090/ 115203 | consumed samples: 7191040 | consumed tokens: 14727249920 | elapsed time per iteration (s): 0.43 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.374944E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.899 | TFLOPs: 31.32 | 7: iteration 28100/ 115203 | consumed samples: 7193600 | consumed tokens: 14732492800 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.392898E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.247 | TFLOPs: 32.07 | 7: iteration 28110/ 115203 | consumed samples: 7196160 | consumed tokens: 14737735680 | elapsed time per iteration (s): 0.43 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.372365E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.611 | TFLOPs: 31.41 | 7: iteration 28120/ 115203 | consumed samples: 7198720 | consumed tokens: 14742978560 | elapsed time per iteration (s): 0.43 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.370175E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.018 | TFLOPs: 31.11 | 7: iteration 28130/ 115203 | consumed samples: 7201280 | consumed tokens: 14748221440 | elapsed time per iteration (s): 0.42 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.399107E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.754 | TFLOPs: 31.73 | 7: iteration 28140/ 115203 | consumed samples: 7203840 | consumed tokens: 14753464320 | elapsed time per iteration (s): 0.43 | learning rate: 1.763E-04 | global batch size: 256 | lm loss: 2.389721E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.981 | TFLOPs: 31.32 | 7: iteration 28150/ 115203 | consumed samples: 7206400 | consumed tokens: 14758707200 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.383521E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.039 | TFLOPs: 32.06 | 7: iteration 28160/ 115203 | consumed samples: 7208960 | consumed tokens: 14763950080 | elapsed time per iteration (s): 0.43 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.348921E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.189 | TFLOPs: 30.97 | 7: iteration 28170/ 115203 | consumed samples: 7211520 | consumed tokens: 14769192960 | elapsed time per iteration (s): 0.43 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.367902E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.582 | TFLOPs: 31.09 | 7: iteration 28180/ 115203 | consumed samples: 7214080 | consumed tokens: 14774435840 | elapsed time per iteration (s): 0.43 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.352970E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.088 | TFLOPs: 31.28 | 7: iteration 28190/ 115203 | consumed samples: 7216640 | consumed tokens: 14779678720 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.376110E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.142 | TFLOPs: 31.86 | 7: iteration 28200/ 115203 | consumed samples: 7219200 | consumed tokens: 14784921600 | elapsed time per iteration (s): 0.42 | learning rate: 1.762E-04 | global batch size: 256 | lm loss: 2.338813E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.638 | TFLOPs: 31.93 | 7: iteration 28210/ 115203 | consumed samples: 7221760 | consumed tokens: 14790164480 | elapsed time per iteration (s): 0.43 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.357319E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.145 | TFLOPs: 31.44 | 7: iteration 28220/ 115203 | consumed samples: 7224320 | consumed tokens: 14795407360 | elapsed time per iteration (s): 0.45 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.390932E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.621 | TFLOPs: 29.99 | 7: iteration 28230/ 115203 | consumed samples: 7226880 | consumed tokens: 14800650240 | elapsed time per iteration (s): 0.42 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.358865E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.013 | TFLOPs: 31.74 | 7: iteration 28240/ 115203 | consumed samples: 7229440 | consumed tokens: 14805893120 | elapsed time per iteration (s): 0.43 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.364609E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.607 | TFLOPs: 31.15 | 7: iteration 28250/ 115203 | consumed samples: 7232000 | consumed tokens: 14811136000 | elapsed time per iteration (s): 0.43 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.372600E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.007 | TFLOPs: 31.43 | 7: iteration 28260/ 115203 | consumed samples: 7234560 | consumed tokens: 14816378880 | elapsed time per iteration (s): 0.43 | learning rate: 1.761E-04 | global batch size: 256 | lm loss: 2.365541E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.899 | TFLOPs: 31.11 | 7: iteration 28270/ 115203 | consumed samples: 7237120 | consumed tokens: 14821621760 | elapsed time per iteration (s): 0.43 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.356866E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.191 | TFLOPs: 31.39 | 7: iteration 28280/ 115203 | consumed samples: 7239680 | consumed tokens: 14826864640 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.380209E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.173 | TFLOPs: 31.80 | 7: iteration 28290/ 115203 | consumed samples: 7242240 | consumed tokens: 14832107520 | elapsed time per iteration (s): 0.42 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.387488E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.564 | TFLOPs: 31.67 | 7: iteration 28300/ 115203 | consumed samples: 7244800 | consumed tokens: 14837350400 | elapsed time per iteration (s): 0.43 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.368420E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.332 | TFLOPs: 31.03 | 7: iteration 28310/ 115203 | consumed samples: 7247360 | consumed tokens: 14842593280 | elapsed time per iteration (s): 0.43 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.401151E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.503 | TFLOPs: 31.56 | 7: iteration 28320/ 115203 | consumed samples: 7249920 | consumed tokens: 14847836160 | elapsed time per iteration (s): 0.43 | learning rate: 1.760E-04 | global batch size: 256 | lm loss: 2.430176E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.620 | TFLOPs: 31.46 | 7: iteration 28330/ 115203 | consumed samples: 7252480 | consumed tokens: 14853079040 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.397291E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.145 | TFLOPs: 31.17 | 7: iteration 28340/ 115203 | consumed samples: 7255040 | consumed tokens: 14858321920 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.378175E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.050 | TFLOPs: 31.54 | 7: iteration 28350/ 115203 | consumed samples: 7257600 | consumed tokens: 14863564800 | elapsed time per iteration (s): 0.44 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.360365E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.757 | TFLOPs: 30.63 | 7: iteration 28360/ 115203 | consumed samples: 7260160 | consumed tokens: 14868807680 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.382786E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.303 | TFLOPs: 31.50 | 7: iteration 28370/ 115203 | consumed samples: 7262720 | consumed tokens: 14874050560 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.391588E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.558 | TFLOPs: 31.51 | 7: iteration 28380/ 115203 | consumed samples: 7265280 | consumed tokens: 14879293440 | elapsed time per iteration (s): 0.43 | learning rate: 1.759E-04 | global batch size: 256 | lm loss: 2.353178E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.959 | TFLOPs: 31.01 | 7: iteration 28390/ 115203 | consumed samples: 7267840 | consumed tokens: 14884536320 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.366155E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.292 | TFLOPs: 31.39 | 7: iteration 28400/ 115203 | consumed samples: 7270400 | consumed tokens: 14889779200 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.393666E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.972 | TFLOPs: 31.48 | 7: iteration 28410/ 115203 | consumed samples: 7272960 | consumed tokens: 14895022080 | elapsed time per iteration (s): 0.42 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.372421E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.529 | TFLOPs: 31.82 | 7: iteration 28420/ 115203 | consumed samples: 7275520 | consumed tokens: 14900264960 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.405439E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.417 | TFLOPs: 31.50 | 7: iteration 28430/ 115203 | consumed samples: 7278080 | consumed tokens: 14905507840 | elapsed time per iteration (s): 0.43 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.349231E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.042 | TFLOPs: 31.59 | 7: iteration 28440/ 115203 | consumed samples: 7280640 | consumed tokens: 14910750720 | elapsed time per iteration (s): 0.45 | learning rate: 1.758E-04 | global batch size: 256 | lm loss: 2.363211E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.853 | TFLOPs: 29.85 | 7: iteration 28450/ 115203 | consumed samples: 7283200 | consumed tokens: 14915993600 | elapsed time per iteration (s): 0.43 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 2.344683E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.730 | TFLOPs: 31.20 | 7: iteration 28460/ 115203 | consumed samples: 7285760 | consumed tokens: 14921236480 | elapsed time per iteration (s): 0.43 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 2.368275E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.951 | TFLOPs: 31.11 | 7: iteration 28470/ 115203 | consumed samples: 7288320 | consumed tokens: 14926479360 | elapsed time per iteration (s): 0.42 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 2.334192E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.255 | TFLOPs: 31.81 | 7: iteration 28480/ 115203 | consumed samples: 7290880 | consumed tokens: 14931722240 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 2.375462E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.970 | TFLOPs: 30.59 | 7: iteration 28490/ 115203 | consumed samples: 7293440 | consumed tokens: 14936965120 | elapsed time per iteration (s): 0.44 | learning rate: 1.757E-04 | global batch size: 256 | lm loss: 2.376785E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.271 | TFLOPs: 30.76 | 7: iteration 28500/ 115203 | consumed samples: 7296000 | consumed tokens: 14942208000 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.385883E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.830 | TFLOPs: 31.47 | 7: iteration 28510/ 115203 | consumed samples: 7298560 | consumed tokens: 14947450880 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.379590E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.838 | TFLOPs: 30.90 | 7: iteration 28520/ 115203 | consumed samples: 7301120 | consumed tokens: 14952693760 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.393665E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.372 | TFLOPs: 31.50 | 7: iteration 28530/ 115203 | consumed samples: 7303680 | consumed tokens: 14957936640 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.354286E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.400 | TFLOPs: 31.29 | 7: iteration 28540/ 115203 | consumed samples: 7306240 | consumed tokens: 14963179520 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.381301E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.613 | TFLOPs: 30.88 | 7: iteration 28550/ 115203 | consumed samples: 7308800 | consumed tokens: 14968422400 | elapsed time per iteration (s): 0.43 | learning rate: 1.756E-04 | global batch size: 256 | lm loss: 2.378240E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.674 | TFLOPs: 31.15 | 7: iteration 28560/ 115203 | consumed samples: 7311360 | consumed tokens: 14973665280 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.379831E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.207 | TFLOPs: 31.81 | 7: iteration 28570/ 115203 | consumed samples: 7313920 | consumed tokens: 14978908160 | elapsed time per iteration (s): 0.43 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.352478E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.090 | TFLOPs: 31.12 | 7: iteration 28580/ 115203 | consumed samples: 7316480 | consumed tokens: 14984151040 | elapsed time per iteration (s): 0.43 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.374569E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.067 | TFLOPs: 31.27 | 7: iteration 28590/ 115203 | consumed samples: 7319040 | consumed tokens: 14989393920 | elapsed time per iteration (s): 0.42 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.366062E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.250 | TFLOPs: 31.70 | 7: iteration 28600/ 115203 | consumed samples: 7321600 | consumed tokens: 14994636800 | elapsed time per iteration (s): 0.44 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.387262E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.381 | TFLOPs: 30.50 | 7: iteration 28610/ 115203 | consumed samples: 7324160 | consumed tokens: 14999879680 | elapsed time per iteration (s): 0.43 | learning rate: 1.755E-04 | global batch size: 256 | lm loss: 2.372410E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.061 | TFLOPs: 31.59 | 7: iteration 28620/ 115203 | consumed samples: 7326720 | consumed tokens: 15005122560 | elapsed time per iteration (s): 0.44 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.369861E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.780 | TFLOPs: 30.47 | 7: iteration 28630/ 115203 | consumed samples: 7329280 | consumed tokens: 15010365440 | elapsed time per iteration (s): 0.43 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.372972E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.390 | TFLOPs: 31.13 | 7: iteration 28640/ 115203 | consumed samples: 7331840 | consumed tokens: 15015608320 | elapsed time per iteration (s): 0.43 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.385353E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.359 | TFLOPs: 31.03 | 7: iteration 28650/ 115203 | consumed samples: 7334400 | consumed tokens: 15020851200 | elapsed time per iteration (s): 0.44 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.358215E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.942 | TFLOPs: 30.64 | 7: iteration 28660/ 115203 | consumed samples: 7336960 | consumed tokens: 15026094080 | elapsed time per iteration (s): 0.42 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.389900E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.883 | TFLOPs: 31.74 | 7: iteration 28670/ 115203 | consumed samples: 7339520 | consumed tokens: 15031336960 | elapsed time per iteration (s): 0.43 | learning rate: 1.754E-04 | global batch size: 256 | lm loss: 2.371104E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.682 | TFLOPs: 31.15 | 7: iteration 28680/ 115203 | consumed samples: 7342080 | consumed tokens: 15036579840 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.381785E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.848 | TFLOPs: 32.00 | 7: iteration 28690/ 115203 | consumed samples: 7344640 | consumed tokens: 15041822720 | elapsed time per iteration (s): 0.43 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.362825E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.396 | TFLOPs: 31.19 | 7: iteration 28700/ 115203 | consumed samples: 7347200 | consumed tokens: 15047065600 | elapsed time per iteration (s): 0.43 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.390882E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.557 | TFLOPs: 31.09 | 7: iteration 28710/ 115203 | consumed samples: 7349760 | consumed tokens: 15052308480 | elapsed time per iteration (s): 0.42 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.359859E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.991 | TFLOPs: 31.80 | 7: iteration 28720/ 115203 | consumed samples: 7352320 | consumed tokens: 15057551360 | elapsed time per iteration (s): 0.43 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.382129E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.712 | TFLOPs: 31.20 | 7: iteration 28730/ 115203 | consumed samples: 7354880 | consumed tokens: 15062794240 | elapsed time per iteration (s): 0.43 | learning rate: 1.753E-04 | global batch size: 256 | lm loss: 2.392543E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.547 | TFLOPs: 31.51 | 7: iteration 28740/ 115203 | consumed samples: 7357440 | consumed tokens: 15068037120 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.402374E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.255 | TFLOPs: 31.91 | 7: iteration 28750/ 115203 | consumed samples: 7360000 | consumed tokens: 15073280000 | elapsed time per iteration (s): 0.43 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.398003E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.649 | TFLOPs: 31.15 | 7: iteration 28760/ 115203 | consumed samples: 7362560 | consumed tokens: 15078522880 | elapsed time per iteration (s): 0.43 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.380398E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.583 | TFLOPs: 31.20 | 7: iteration 28770/ 115203 | consumed samples: 7365120 | consumed tokens: 15083765760 | elapsed time per iteration (s): 0.42 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.374716E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.699 | TFLOPs: 31.68 | 7: iteration 28780/ 115203 | consumed samples: 7367680 | consumed tokens: 15089008640 | elapsed time per iteration (s): 0.43 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.381784E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.276 | TFLOPs: 31.44 | 7: iteration 28790/ 115203 | consumed samples: 7370240 | consumed tokens: 15094251520 | elapsed time per iteration (s): 0.58 | learning rate: 1.752E-04 | global batch size: 256 | lm loss: 2.371537E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 443.528 | TFLOPs: 23.27 | 7: iteration 28800/ 115203 | consumed samples: 7372800 | consumed tokens: 15099494400 | elapsed time per iteration (s): 0.43 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.381420E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.721 | TFLOPs: 31.57 | 7: iteration 28810/ 115203 | consumed samples: 7375360 | consumed tokens: 15104737280 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.362218E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.362 | TFLOPs: 31.76 | 7: iteration 28820/ 115203 | consumed samples: 7377920 | consumed tokens: 15109980160 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.350188E+00 | grad norm: 0.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.603 | TFLOPs: 32.19 | 7: iteration 28830/ 115203 | consumed samples: 7380480 | consumed tokens: 15115223040 | elapsed time per iteration (s): 0.43 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.379670E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.224 | TFLOPs: 30.97 | 7: iteration 28840/ 115203 | consumed samples: 7383040 | consumed tokens: 15120465920 | elapsed time per iteration (s): 0.43 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.356244E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.527 | TFLOPs: 31.25 | 7: iteration 28850/ 115203 | consumed samples: 7385600 | consumed tokens: 15125708800 | elapsed time per iteration (s): 0.42 | learning rate: 1.751E-04 | global batch size: 256 | lm loss: 2.386749E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.928 | TFLOPs: 31.74 | 7: iteration 28860/ 115203 | consumed samples: 7388160 | consumed tokens: 15130951680 | elapsed time per iteration (s): 0.43 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 2.368929E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.767 | TFLOPs: 31.52 | 7: iteration 28870/ 115203 | consumed samples: 7390720 | consumed tokens: 15136194560 | elapsed time per iteration (s): 0.43 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 2.368343E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.947 | TFLOPs: 31.58 | 7: iteration 28880/ 115203 | consumed samples: 7393280 | consumed tokens: 15141437440 | elapsed time per iteration (s): 0.42 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 2.381799E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.435 | TFLOPs: 31.66 | 7: iteration 28890/ 115203 | consumed samples: 7395840 | consumed tokens: 15146680320 | elapsed time per iteration (s): 0.43 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 2.381259E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.583 | TFLOPs: 31.20 | 7: iteration 28900/ 115203 | consumed samples: 7398400 | consumed tokens: 15151923200 | elapsed time per iteration (s): 0.44 | learning rate: 1.750E-04 | global batch size: 256 | lm loss: 2.376708E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.452 | TFLOPs: 30.51 | 7: iteration 28910/ 115203 | consumed samples: 7400960 | consumed tokens: 15157166080 | elapsed time per iteration (s): 0.43 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.382832E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.681 | TFLOPs: 31.46 | 7: iteration 28920/ 115203 | consumed samples: 7403520 | consumed tokens: 15162408960 | elapsed time per iteration (s): 0.44 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.393465E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.108 | TFLOPs: 30.65 | 7: iteration 28930/ 115203 | consumed samples: 7406080 | consumed tokens: 15167651840 | elapsed time per iteration (s): 0.44 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.372557E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.426 | TFLOPs: 30.77 | 7: iteration 28940/ 115203 | consumed samples: 7408640 | consumed tokens: 15172894720 | elapsed time per iteration (s): 0.43 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.398778E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.284 | TFLOPs: 31.39 | 7: iteration 28950/ 115203 | consumed samples: 7411200 | consumed tokens: 15178137600 | elapsed time per iteration (s): 0.43 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.392097E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.740 | TFLOPs: 31.52 | 7: iteration 28960/ 115203 | consumed samples: 7413760 | consumed tokens: 15183380480 | elapsed time per iteration (s): 0.43 | learning rate: 1.749E-04 | global batch size: 256 | lm loss: 2.344751E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.865 | TFLOPs: 31.21 | 7: iteration 28970/ 115203 | consumed samples: 7416320 | consumed tokens: 15188623360 | elapsed time per iteration (s): 0.43 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.413319E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.932 | TFLOPs: 31.16 | 7: iteration 28980/ 115203 | consumed samples: 7418880 | consumed tokens: 15193866240 | elapsed time per iteration (s): 0.43 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.406568E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.374 | TFLOPs: 31.34 | 7: iteration 28990/ 115203 | consumed samples: 7421440 | consumed tokens: 15199109120 | elapsed time per iteration (s): 0.50 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.382646E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 511.173 | TFLOPs: 26.82 | 7: iteration 29000/ 115203 | consumed samples: 7424000 | consumed tokens: 15204352000 | elapsed time per iteration (s): 0.42 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.358098E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.589 | TFLOPs: 31.83 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 29000 | lm loss value: 2.412322E+00 | lm loss PPL: 1.115984E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 29000 to checkpoints_221m 0: [2022-11-28 16:26:39,613] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step29000 is begin to save! 0: [2022-11-28 16:26:39,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:26:39,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:26:39,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:26:39,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:26:39,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:26:39,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:26:39,789] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:26:39,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:26:39,813] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:26:39,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:26:39,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:26:39,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:26:39,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:26:39,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:26:39,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:26:39,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:26:39,906] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:26:39,929] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:26:39,930] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:26:39,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:26:39,954] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:26:39,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:26:39,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:26:40,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:26:40,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:26:40,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:26:40,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:26:40,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:26:40,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:26:40,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:26:40,071] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:26:40,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:26:40,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:26:40,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:26:40,120] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:26:40,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:26:40,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:26:40,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:26:40,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:26:40,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:26:40,174] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step29000/mp_rank_00_model_states.pt 0: [2022-11-28 16:26:40,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:26:40,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:26:40,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,252] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,253] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,253] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:26:40,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,254] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 7: [2022-11-28 16:26:40,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2022-11-28 16:26:40,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,254] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 7: [2022-11-28 16:26:40,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 1: [2022-11-28 16:26:40,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,251] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,251] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:26:40,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,255] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2022-11-28 16:26:40,263] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,255] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 4: [2022-11-28 16:26:40,263] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 5: [2022-11-28 16:26:40,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:26:40,257] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:26:40,257] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,265] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,265] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:26:40,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,267] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 3: [2022-11-28 16:26:40,267] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:26:40,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 6: [2022-11-28 16:26:40,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:26:40,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: [2022-11-28 16:26:40,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:26:40,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 2: [2022-11-28 16:26:40,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step29000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:26:40,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step29000 is ready now! 0: successfully saved checkpoint at iteration 29000 to checkpoints_221m 7: time (ms) | save-checkpoint: 837.80 7: iteration 29010/ 115203 | consumed samples: 7426560 | consumed tokens: 15209594880 | elapsed time per iteration (s): 0.53 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.332259E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 487.240 | TFLOPs: 25.56 | 7: iteration 29020/ 115203 | consumed samples: 7429120 | consumed tokens: 15214837760 | elapsed time per iteration (s): 0.43 | learning rate: 1.748E-04 | global batch size: 256 | lm loss: 2.378776E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.060 | TFLOPs: 31.43 | 7: iteration 29030/ 115203 | consumed samples: 7431680 | consumed tokens: 15220080640 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.364213E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.636 | TFLOPs: 31.41 | 7: iteration 29040/ 115203 | consumed samples: 7434240 | consumed tokens: 15225323520 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.356569E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.544 | TFLOPs: 31.14 | 7: iteration 29050/ 115203 | consumed samples: 7436800 | consumed tokens: 15230566400 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.384188E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.082 | TFLOPs: 31.49 | 7: iteration 29060/ 115203 | consumed samples: 7439360 | consumed tokens: 15235809280 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.383153E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.281 | TFLOPs: 31.29 | 7: iteration 29070/ 115203 | consumed samples: 7441920 | consumed tokens: 15241052160 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.359163E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.102 | TFLOPs: 31.38 | 7: iteration 29080/ 115203 | consumed samples: 7444480 | consumed tokens: 15246295040 | elapsed time per iteration (s): 0.43 | learning rate: 1.747E-04 | global batch size: 256 | lm loss: 2.394313E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.333 | TFLOPs: 31.50 | 7: iteration 29090/ 115203 | consumed samples: 7447040 | consumed tokens: 15251537920 | elapsed time per iteration (s): 0.43 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.379308E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.851 | TFLOPs: 31.47 | 7: iteration 29100/ 115203 | consumed samples: 7449600 | consumed tokens: 15256780800 | elapsed time per iteration (s): 0.43 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.344541E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.094 | TFLOPs: 31.43 | 7: iteration 29110/ 115203 | consumed samples: 7452160 | consumed tokens: 15262023680 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.375416E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.314 | TFLOPs: 31.86 | 7: iteration 29120/ 115203 | consumed samples: 7454720 | consumed tokens: 15267266560 | elapsed time per iteration (s): 0.42 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.354906E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.722 | TFLOPs: 31.62 | 7: iteration 29130/ 115203 | consumed samples: 7457280 | consumed tokens: 15272509440 | elapsed time per iteration (s): 0.43 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.369278E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.480 | TFLOPs: 31.30 | 7: iteration 29140/ 115203 | consumed samples: 7459840 | consumed tokens: 15277752320 | elapsed time per iteration (s): 0.43 | learning rate: 1.746E-04 | global batch size: 256 | lm loss: 2.359861E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.791 | TFLOPs: 31.31 | 7: iteration 29150/ 115203 | consumed samples: 7462400 | consumed tokens: 15282995200 | elapsed time per iteration (s): 0.43 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 2.386812E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.374 | TFLOPs: 31.03 | 7: iteration 29160/ 115203 | consumed samples: 7464960 | consumed tokens: 15288238080 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 2.367002E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.266 | TFLOPs: 31.86 | 7: iteration 29170/ 115203 | consumed samples: 7467520 | consumed tokens: 15293480960 | elapsed time per iteration (s): 0.42 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 2.357971E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.539 | TFLOPs: 31.67 | 7: iteration 29180/ 115203 | consumed samples: 7470080 | consumed tokens: 15298723840 | elapsed time per iteration (s): 0.44 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 2.383059E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.977 | TFLOPs: 30.80 | 7: iteration 29190/ 115203 | consumed samples: 7472640 | consumed tokens: 15303966720 | elapsed time per iteration (s): 0.43 | learning rate: 1.745E-04 | global batch size: 256 | lm loss: 2.368182E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.841 | TFLOPs: 31.58 | 7: iteration 29200/ 115203 | consumed samples: 7475200 | consumed tokens: 15309209600 | elapsed time per iteration (s): 0.43 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.324549E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.040 | TFLOPs: 31.06 | 7: iteration 29210/ 115203 | consumed samples: 7477760 | consumed tokens: 15314452480 | elapsed time per iteration (s): 0.43 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.390324E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.385 | TFLOPs: 31.29 | 7: iteration 29220/ 115203 | consumed samples: 7480320 | consumed tokens: 15319695360 | elapsed time per iteration (s): 0.43 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.373273E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.236 | TFLOPs: 31.39 | 7: iteration 29230/ 115203 | consumed samples: 7482880 | consumed tokens: 15324938240 | elapsed time per iteration (s): 0.43 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.344539E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.408 | TFLOPs: 31.14 | 7: iteration 29240/ 115203 | consumed samples: 7485440 | consumed tokens: 15330181120 | elapsed time per iteration (s): 0.44 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.365952E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.177 | TFLOPs: 30.81 | 7: iteration 29250/ 115203 | consumed samples: 7488000 | consumed tokens: 15335424000 | elapsed time per iteration (s): 0.44 | learning rate: 1.744E-04 | global batch size: 256 | lm loss: 2.344434E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.125 | TFLOPs: 30.75 | 7: iteration 29260/ 115203 | consumed samples: 7490560 | consumed tokens: 15340666880 | elapsed time per iteration (s): 0.43 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.359451E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.208 | TFLOPs: 31.60 | 7: iteration 29270/ 115203 | consumed samples: 7493120 | consumed tokens: 15345909760 | elapsed time per iteration (s): 0.43 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.370745E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.799 | TFLOPs: 31.00 | 7: iteration 29280/ 115203 | consumed samples: 7495680 | consumed tokens: 15351152640 | elapsed time per iteration (s): 0.43 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.381363E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.786 | TFLOPs: 31.52 | 7: iteration 29290/ 115203 | consumed samples: 7498240 | consumed tokens: 15356395520 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.361787E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.092 | TFLOPs: 31.70 | 7: iteration 29300/ 115203 | consumed samples: 7500800 | consumed tokens: 15361638400 | elapsed time per iteration (s): 0.43 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.377335E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.920 | TFLOPs: 31.42 | 7: iteration 29310/ 115203 | consumed samples: 7503360 | consumed tokens: 15366881280 | elapsed time per iteration (s): 0.42 | learning rate: 1.743E-04 | global batch size: 256 | lm loss: 2.341772E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.737 | TFLOPs: 31.62 | 7: iteration 29320/ 115203 | consumed samples: 7505920 | consumed tokens: 15372124160 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.395164E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.002 | TFLOPs: 31.64 | 7: iteration 29330/ 115203 | consumed samples: 7508480 | consumed tokens: 15377367040 | elapsed time per iteration (s): 0.42 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.366193E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.081 | TFLOPs: 32.17 | 7: iteration 29340/ 115203 | consumed samples: 7511040 | consumed tokens: 15382609920 | elapsed time per iteration (s): 0.43 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.346158E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.834 | TFLOPs: 31.31 | 7: iteration 29350/ 115203 | consumed samples: 7513600 | consumed tokens: 15387852800 | elapsed time per iteration (s): 0.43 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.350674E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.307 | TFLOPs: 31.44 | 7: iteration 29360/ 115203 | consumed samples: 7516160 | consumed tokens: 15393095680 | elapsed time per iteration (s): 0.43 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.383033E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.383 | TFLOPs: 31.50 | 7: iteration 29370/ 115203 | consumed samples: 7518720 | consumed tokens: 15398338560 | elapsed time per iteration (s): 0.43 | learning rate: 1.742E-04 | global batch size: 256 | lm loss: 2.363171E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.250 | TFLOPs: 31.44 | 7: iteration 29380/ 115203 | consumed samples: 7521280 | consumed tokens: 15403581440 | elapsed time per iteration (s): 0.43 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.389337E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.601 | TFLOPs: 31.51 | 7: iteration 29390/ 115203 | consumed samples: 7523840 | consumed tokens: 15408824320 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.368659E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.998 | TFLOPs: 31.80 | 7: iteration 29400/ 115203 | consumed samples: 7526400 | consumed tokens: 15414067200 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.395750E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.776 | TFLOPs: 31.89 | 7: iteration 29410/ 115203 | consumed samples: 7528960 | consumed tokens: 15419310080 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.324385E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.883 | TFLOPs: 32.05 | 7: iteration 29420/ 115203 | consumed samples: 7531520 | consumed tokens: 15424552960 | elapsed time per iteration (s): 0.43 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.330115E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.739 | TFLOPs: 31.10 | 7: iteration 29430/ 115203 | consumed samples: 7534080 | consumed tokens: 15429795840 | elapsed time per iteration (s): 0.42 | learning rate: 1.741E-04 | global batch size: 256 | lm loss: 2.391963E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.847 | TFLOPs: 32.00 | 7: iteration 29440/ 115203 | consumed samples: 7536640 | consumed tokens: 15435038720 | elapsed time per iteration (s): 0.44 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 2.372581E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.875 | TFLOPs: 30.27 | 7: iteration 29450/ 115203 | consumed samples: 7539200 | consumed tokens: 15440281600 | elapsed time per iteration (s): 0.42 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 2.363760E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.126 | TFLOPs: 31.80 | 7: iteration 29460/ 115203 | consumed samples: 7541760 | consumed tokens: 15445524480 | elapsed time per iteration (s): 0.43 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 2.394025E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.562 | TFLOPs: 31.25 | 7: iteration 29470/ 115203 | consumed samples: 7544320 | consumed tokens: 15450767360 | elapsed time per iteration (s): 0.43 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 2.382158E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.217 | TFLOPs: 31.23 | 7: iteration 29480/ 115203 | consumed samples: 7546880 | consumed tokens: 15456010240 | elapsed time per iteration (s): 0.44 | learning rate: 1.740E-04 | global batch size: 256 | lm loss: 2.385885E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.371 | TFLOPs: 30.87 | 7: iteration 29490/ 115203 | consumed samples: 7549440 | consumed tokens: 15461253120 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.357368E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.168 | TFLOPs: 31.86 | 7: iteration 29500/ 115203 | consumed samples: 7552000 | consumed tokens: 15466496000 | elapsed time per iteration (s): 0.43 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.389741E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.409 | TFLOPs: 31.55 | 7: iteration 29510/ 115203 | consumed samples: 7554560 | consumed tokens: 15471738880 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.385666E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.686 | TFLOPs: 31.83 | 7: iteration 29520/ 115203 | consumed samples: 7557120 | consumed tokens: 15476981760 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.380809E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.877 | TFLOPs: 31.95 | 7: iteration 29530/ 115203 | consumed samples: 7559680 | consumed tokens: 15482224640 | elapsed time per iteration (s): 0.43 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.350074E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.468 | TFLOPs: 31.45 | 7: iteration 29540/ 115203 | consumed samples: 7562240 | consumed tokens: 15487467520 | elapsed time per iteration (s): 0.42 | learning rate: 1.739E-04 | global batch size: 256 | lm loss: 2.385261E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.956 | TFLOPs: 31.95 | 7: iteration 29550/ 115203 | consumed samples: 7564800 | consumed tokens: 15492710400 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.349272E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.252 | TFLOPs: 31.76 | 7: iteration 29560/ 115203 | consumed samples: 7567360 | consumed tokens: 15497953280 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.361487E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.052 | TFLOPs: 31.64 | 7: iteration 29570/ 115203 | consumed samples: 7569920 | consumed tokens: 15503196160 | elapsed time per iteration (s): 0.43 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.359990E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.077 | TFLOPs: 31.07 | 7: iteration 29580/ 115203 | consumed samples: 7572480 | consumed tokens: 15508439040 | elapsed time per iteration (s): 0.43 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.380350E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.145 | TFLOPs: 31.44 | 7: iteration 29590/ 115203 | consumed samples: 7575040 | consumed tokens: 15513681920 | elapsed time per iteration (s): 0.42 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.354372E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.835 | TFLOPs: 31.89 | 7: iteration 29600/ 115203 | consumed samples: 7577600 | consumed tokens: 15518924800 | elapsed time per iteration (s): 0.45 | learning rate: 1.738E-04 | global batch size: 256 | lm loss: 2.397235E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.050 | TFLOPs: 30.12 | 7: iteration 29610/ 115203 | consumed samples: 7580160 | consumed tokens: 15524167680 | elapsed time per iteration (s): 0.43 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 2.347729E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.806 | TFLOPs: 31.21 | 7: iteration 29620/ 115203 | consumed samples: 7582720 | consumed tokens: 15529410560 | elapsed time per iteration (s): 0.44 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 2.374079E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.993 | TFLOPs: 30.80 | 7: iteration 29630/ 115203 | consumed samples: 7585280 | consumed tokens: 15534653440 | elapsed time per iteration (s): 0.43 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 2.346528E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.058 | TFLOPs: 31.59 | 7: iteration 29640/ 115203 | consumed samples: 7587840 | consumed tokens: 15539896320 | elapsed time per iteration (s): 0.43 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 2.372159E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.668 | TFLOPs: 31.46 | 7: iteration 29650/ 115203 | consumed samples: 7590400 | consumed tokens: 15545139200 | elapsed time per iteration (s): 0.42 | learning rate: 1.737E-04 | global batch size: 256 | lm loss: 2.358233E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.421 | TFLOPs: 31.98 | 7: iteration 29660/ 115203 | consumed samples: 7592960 | consumed tokens: 15550382080 | elapsed time per iteration (s): 0.44 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.367403E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.673 | TFLOPs: 30.78 | 7: iteration 29670/ 115203 | consumed samples: 7595520 | consumed tokens: 15555624960 | elapsed time per iteration (s): 0.43 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.367687E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.635 | TFLOPs: 31.46 | 7: iteration 29680/ 115203 | consumed samples: 7598080 | consumed tokens: 15560867840 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.388167E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.010 | TFLOPs: 31.74 | 7: iteration 29690/ 115203 | consumed samples: 7600640 | consumed tokens: 15566110720 | elapsed time per iteration (s): 0.43 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.337281E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.364 | TFLOPs: 31.24 | 7: iteration 29700/ 115203 | consumed samples: 7603200 | consumed tokens: 15571353600 | elapsed time per iteration (s): 0.43 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.358893E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.118 | TFLOPs: 31.38 | 7: iteration 29710/ 115203 | consumed samples: 7605760 | consumed tokens: 15576596480 | elapsed time per iteration (s): 0.42 | learning rate: 1.736E-04 | global batch size: 256 | lm loss: 2.362617E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.272 | TFLOPs: 31.71 | 7: iteration 29720/ 115203 | consumed samples: 7608320 | consumed tokens: 15581839360 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.363194E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.478 | TFLOPs: 31.77 | 7: iteration 29730/ 115203 | consumed samples: 7610880 | consumed tokens: 15587082240 | elapsed time per iteration (s): 0.43 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.384892E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.958 | TFLOPs: 31.01 | 7: iteration 29740/ 115203 | consumed samples: 7613440 | consumed tokens: 15592325120 | elapsed time per iteration (s): 0.42 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.371830E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.600 | TFLOPs: 31.72 | 7: iteration 29750/ 115203 | consumed samples: 7616000 | consumed tokens: 15597568000 | elapsed time per iteration (s): 0.45 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.361776E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.851 | TFLOPs: 30.16 | 7: iteration 29760/ 115203 | consumed samples: 7618560 | consumed tokens: 15602810880 | elapsed time per iteration (s): 0.44 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.348330E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.502 | TFLOPs: 30.77 | 7: iteration 29770/ 115203 | consumed samples: 7621120 | consumed tokens: 15608053760 | elapsed time per iteration (s): 0.43 | learning rate: 1.735E-04 | global batch size: 256 | lm loss: 2.368264E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.684 | TFLOPs: 31.36 | 7: iteration 29780/ 115203 | consumed samples: 7623680 | consumed tokens: 15613296640 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 2.321816E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.046 | TFLOPs: 32.06 | 7: iteration 29790/ 115203 | consumed samples: 7626240 | consumed tokens: 15618539520 | elapsed time per iteration (s): 0.43 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 2.369784E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.438 | TFLOPs: 31.40 | 7: iteration 29800/ 115203 | consumed samples: 7628800 | consumed tokens: 15623782400 | elapsed time per iteration (s): 0.42 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 2.393763E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.162 | TFLOPs: 31.75 | 7: iteration 29810/ 115203 | consumed samples: 7631360 | consumed tokens: 15629025280 | elapsed time per iteration (s): 0.43 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 2.348653E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.773 | TFLOPs: 31.47 | 7: iteration 29820/ 115203 | consumed samples: 7633920 | consumed tokens: 15634268160 | elapsed time per iteration (s): 0.43 | learning rate: 1.734E-04 | global batch size: 256 | lm loss: 2.350981E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.844 | TFLOPs: 31.58 | 7: iteration 29830/ 115203 | consumed samples: 7636480 | consumed tokens: 15639511040 | elapsed time per iteration (s): 0.43 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.380863E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.114 | TFLOPs: 31.59 | 7: iteration 29840/ 115203 | consumed samples: 7639040 | consumed tokens: 15644753920 | elapsed time per iteration (s): 0.45 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.339061E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.801 | TFLOPs: 29.84 | 7: iteration 29850/ 115203 | consumed samples: 7641600 | consumed tokens: 15649996800 | elapsed time per iteration (s): 0.44 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.333231E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.280 | TFLOPs: 30.55 | 7: iteration 29860/ 115203 | consumed samples: 7644160 | consumed tokens: 15655239680 | elapsed time per iteration (s): 0.44 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.354635E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.691 | TFLOPs: 30.68 | 7: iteration 29870/ 115203 | consumed samples: 7646720 | consumed tokens: 15660482560 | elapsed time per iteration (s): 0.43 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.342831E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.905 | TFLOPs: 30.95 | 7: iteration 29880/ 115203 | consumed samples: 7649280 | consumed tokens: 15665725440 | elapsed time per iteration (s): 0.43 | learning rate: 1.733E-04 | global batch size: 256 | lm loss: 2.365987E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.734 | TFLOPs: 31.31 | 7: iteration 29890/ 115203 | consumed samples: 7651840 | consumed tokens: 15670968320 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.365698E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.191 | TFLOPs: 31.65 | 7: iteration 29900/ 115203 | consumed samples: 7654400 | consumed tokens: 15676211200 | elapsed time per iteration (s): 0.43 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.384723E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.681 | TFLOPs: 31.52 | 7: iteration 29910/ 115203 | consumed samples: 7656960 | consumed tokens: 15681454080 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.359467E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.076 | TFLOPs: 32.06 | 7: iteration 29920/ 115203 | consumed samples: 7659520 | consumed tokens: 15686696960 | elapsed time per iteration (s): 0.44 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.368095E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.922 | TFLOPs: 30.79 | 7: iteration 29930/ 115203 | consumed samples: 7662080 | consumed tokens: 15691939840 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.349277E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.211 | TFLOPs: 32.07 | 7: iteration 29940/ 115203 | consumed samples: 7664640 | consumed tokens: 15697182720 | elapsed time per iteration (s): 0.42 | learning rate: 1.732E-04 | global batch size: 256 | lm loss: 2.345906E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.268 | TFLOPs: 31.70 | 7: iteration 29950/ 115203 | consumed samples: 7667200 | consumed tokens: 15702425600 | elapsed time per iteration (s): 0.43 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 2.367981E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.564 | TFLOPs: 31.09 | 7: iteration 29960/ 115203 | consumed samples: 7669760 | consumed tokens: 15707668480 | elapsed time per iteration (s): 0.44 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 2.365905E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.065 | TFLOPs: 30.80 | 7: iteration 29970/ 115203 | consumed samples: 7672320 | consumed tokens: 15712911360 | elapsed time per iteration (s): 0.43 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 2.347695E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.131 | TFLOPs: 31.44 | 7: iteration 29980/ 115203 | consumed samples: 7674880 | consumed tokens: 15718154240 | elapsed time per iteration (s): 0.42 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 2.357400E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.587 | TFLOPs: 31.67 | 7: iteration 29990/ 115203 | consumed samples: 7677440 | consumed tokens: 15723397120 | elapsed time per iteration (s): 0.43 | learning rate: 1.731E-04 | global batch size: 256 | lm loss: 2.373613E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.327 | TFLOPs: 30.92 | 0: [2022-11-28 16:33:48,335] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=0, lr=[0.00017304965296758478, 0.00017304965296758478, 0.00017304965296758478], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 30000/ 115203 | consumed samples: 7680000 | consumed tokens: 15728640000 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.379065E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.613 | TFLOPs: 32.09 | 0: steps: 30000 loss: 2.3484 iter time (s): 0.428 samples/sec: 598.436 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 30000 | lm loss value: 2.386735E+00 | lm loss PPL: 1.087792E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 30000 to checkpoints_221m 0: [2022-11-28 16:33:48,504] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step30000 is begin to save! 0: [2022-11-28 16:33:48,507] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:33:48,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:33:48,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:33:48,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:33:48,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:33:48,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:33:48,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:33:48,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:33:48,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:33:48,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:33:48,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:33:48,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:33:48,748] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:33:48,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:33:48,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:33:48,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:33:48,793] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:33:48,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:33:48,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:33:48,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:33:48,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:33:48,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:33:48,863] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:33:48,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:33:48,886] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:33:48,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:33:48,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:33:48,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:33:48,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:33:48,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:33:48,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:33:48,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:33:48,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:33:49,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:33:49,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:33:49,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:33:49,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:33:49,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:33:49,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:33:49,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:33:49,060] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step30000/mp_rank_00_model_states.pt 0: [2022-11-28 16:33:49,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:33:49,062] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:33:49,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:33:49,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 5: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:33:49,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:33:49,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 1: [2022-11-28 16:33:49,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 2: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 7: [2022-11-28 16:33:49,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:33:49,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:33:49,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:33:49,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:33:49,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:33:49,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2022-11-28 16:33:49,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 7: [2022-11-28 16:33:49,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 4: [2022-11-28 16:33:49,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 6: [2022-11-28 16:33:49,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step30000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 3: [2022-11-28 16:33:49,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step30000 is ready now! 0: successfully saved checkpoint at iteration 30000 to checkpoints_221m 7: time (ms) | save-checkpoint: 691.15 7: iteration 30010/ 115203 | consumed samples: 7682560 | consumed tokens: 15733882880 | elapsed time per iteration (s): 0.50 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.357106E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 507.428 | TFLOPs: 26.62 | 7: iteration 30020/ 115203 | consumed samples: 7685120 | consumed tokens: 15739125760 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.378615E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.841 | TFLOPs: 31.84 | 7: iteration 30030/ 115203 | consumed samples: 7687680 | consumed tokens: 15744368640 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.377078E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.874 | TFLOPs: 31.74 | 7: iteration 30040/ 115203 | consumed samples: 7690240 | consumed tokens: 15749611520 | elapsed time per iteration (s): 0.42 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.351263E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.421 | TFLOPs: 32.13 | 7: iteration 30050/ 115203 | consumed samples: 7692800 | consumed tokens: 15754854400 | elapsed time per iteration (s): 0.43 | learning rate: 1.730E-04 | global batch size: 256 | lm loss: 2.375483E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.776 | TFLOPs: 31.15 | 7: iteration 30060/ 115203 | consumed samples: 7695360 | consumed tokens: 15760097280 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.376976E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.494 | TFLOPs: 31.61 | 7: iteration 30070/ 115203 | consumed samples: 7697920 | consumed tokens: 15765340160 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.372085E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.389 | TFLOPs: 31.71 | 7: iteration 30080/ 115203 | consumed samples: 7700480 | consumed tokens: 15770583040 | elapsed time per iteration (s): 0.43 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.380331E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.469 | TFLOPs: 31.56 | 7: iteration 30090/ 115203 | consumed samples: 7703040 | consumed tokens: 15775825920 | elapsed time per iteration (s): 0.42 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.355592E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.994 | TFLOPs: 31.69 | 7: iteration 30100/ 115203 | consumed samples: 7705600 | consumed tokens: 15781068800 | elapsed time per iteration (s): 0.43 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.346709E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.073 | TFLOPs: 31.38 | 7: iteration 30110/ 115203 | consumed samples: 7708160 | consumed tokens: 15786311680 | elapsed time per iteration (s): 0.43 | learning rate: 1.729E-04 | global batch size: 256 | lm loss: 2.378553E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.901 | TFLOPs: 31.48 | 7: iteration 30120/ 115203 | consumed samples: 7710720 | consumed tokens: 15791554560 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 2.353014E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.367 | TFLOPs: 31.76 | 7: iteration 30130/ 115203 | consumed samples: 7713280 | consumed tokens: 15796797440 | elapsed time per iteration (s): 0.42 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 2.366522E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.621 | TFLOPs: 31.62 | 7: iteration 30140/ 115203 | consumed samples: 7715840 | consumed tokens: 15802040320 | elapsed time per iteration (s): 0.44 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 2.370621E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.934 | TFLOPs: 30.64 | 7: iteration 30150/ 115203 | consumed samples: 7718400 | consumed tokens: 15807283200 | elapsed time per iteration (s): 0.43 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 2.367488E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.292 | TFLOPs: 31.44 | 7: iteration 30160/ 115203 | consumed samples: 7720960 | consumed tokens: 15812526080 | elapsed time per iteration (s): 0.43 | learning rate: 1.728E-04 | global batch size: 256 | lm loss: 2.374002E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.198 | TFLOPs: 31.18 | 7: iteration 30170/ 115203 | consumed samples: 7723520 | consumed tokens: 15817768960 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.356872E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.379 | TFLOPs: 31.76 | 7: iteration 30180/ 115203 | consumed samples: 7726080 | consumed tokens: 15823011840 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.366860E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.435 | TFLOPs: 30.93 | 7: iteration 30190/ 115203 | consumed samples: 7728640 | consumed tokens: 15828254720 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.351494E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.414 | TFLOPs: 31.56 | 7: iteration 30200/ 115203 | consumed samples: 7731200 | consumed tokens: 15833497600 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.372269E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.599 | TFLOPs: 31.36 | 7: iteration 30210/ 115203 | consumed samples: 7733760 | consumed tokens: 15838740480 | elapsed time per iteration (s): 0.42 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.341762E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.788 | TFLOPs: 31.78 | 7: iteration 30220/ 115203 | consumed samples: 7736320 | consumed tokens: 15843983360 | elapsed time per iteration (s): 0.43 | learning rate: 1.727E-04 | global batch size: 256 | lm loss: 2.367399E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.208 | TFLOPs: 31.28 | 7: iteration 30230/ 115203 | consumed samples: 7738880 | consumed tokens: 15849226240 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.360064E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.833 | TFLOPs: 31.47 | 7: iteration 30240/ 115203 | consumed samples: 7741440 | consumed tokens: 15854469120 | elapsed time per iteration (s): 0.42 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.334427E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.391 | TFLOPs: 31.92 | 7: iteration 30250/ 115203 | consumed samples: 7744000 | consumed tokens: 15859712000 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.393286E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.996 | TFLOPs: 31.17 | 7: iteration 30260/ 115203 | consumed samples: 7746560 | consumed tokens: 15864954880 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.385348E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.773 | TFLOPs: 31.57 | 7: iteration 30270/ 115203 | consumed samples: 7749120 | consumed tokens: 15870197760 | elapsed time per iteration (s): 0.42 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.367559E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.647 | TFLOPs: 31.72 | 7: iteration 30280/ 115203 | consumed samples: 7751680 | consumed tokens: 15875440640 | elapsed time per iteration (s): 0.43 | learning rate: 1.726E-04 | global batch size: 256 | lm loss: 2.384839E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.483 | TFLOPs: 30.93 | 7: iteration 30290/ 115203 | consumed samples: 7754240 | consumed tokens: 15880683520 | elapsed time per iteration (s): 0.42 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 2.387315E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.144 | TFLOPs: 31.80 | 7: iteration 30300/ 115203 | consumed samples: 7756800 | consumed tokens: 15885926400 | elapsed time per iteration (s): 0.43 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 2.339831E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.228 | TFLOPs: 31.49 | 7: iteration 30310/ 115203 | consumed samples: 7759360 | consumed tokens: 15891169280 | elapsed time per iteration (s): 0.42 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 2.340359E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.348 | TFLOPs: 31.87 | 7: iteration 30320/ 115203 | consumed samples: 7761920 | consumed tokens: 15896412160 | elapsed time per iteration (s): 0.44 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 2.385280E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.647 | TFLOPs: 30.36 | 7: iteration 30330/ 115203 | consumed samples: 7764480 | consumed tokens: 15901655040 | elapsed time per iteration (s): 0.43 | learning rate: 1.725E-04 | global batch size: 256 | lm loss: 2.356041E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.225 | TFLOPs: 31.07 | 7: iteration 30340/ 115203 | consumed samples: 7767040 | consumed tokens: 15906897920 | elapsed time per iteration (s): 0.42 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.366789E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.434 | TFLOPs: 32.03 | 7: iteration 30350/ 115203 | consumed samples: 7769600 | consumed tokens: 15912140800 | elapsed time per iteration (s): 0.42 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.341214E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.495 | TFLOPs: 31.77 | 7: iteration 30360/ 115203 | consumed samples: 7772160 | consumed tokens: 15917383680 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.381069E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.017 | TFLOPs: 31.27 | 7: iteration 30370/ 115203 | consumed samples: 7774720 | consumed tokens: 15922626560 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.362413E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.904 | TFLOPs: 31.27 | 7: iteration 30380/ 115203 | consumed samples: 7777280 | consumed tokens: 15927869440 | elapsed time per iteration (s): 0.42 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.388131E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.844 | TFLOPs: 31.95 | 7: iteration 30390/ 115203 | consumed samples: 7779840 | consumed tokens: 15933112320 | elapsed time per iteration (s): 0.43 | learning rate: 1.724E-04 | global batch size: 256 | lm loss: 2.388111E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.334 | TFLOPs: 31.50 | 7: iteration 30400/ 115203 | consumed samples: 7782400 | consumed tokens: 15938355200 | elapsed time per iteration (s): 0.43 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 2.373040E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.759 | TFLOPs: 31.26 | 7: iteration 30410/ 115203 | consumed samples: 7784960 | consumed tokens: 15943598080 | elapsed time per iteration (s): 0.43 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 2.345635E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.329 | TFLOPs: 31.24 | 7: iteration 30420/ 115203 | consumed samples: 7787520 | consumed tokens: 15948840960 | elapsed time per iteration (s): 0.42 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 2.351429E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.871 | TFLOPs: 31.84 | 7: iteration 30430/ 115203 | consumed samples: 7790080 | consumed tokens: 15954083840 | elapsed time per iteration (s): 0.42 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 2.347389E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.475 | TFLOPs: 31.61 | 7: iteration 30440/ 115203 | consumed samples: 7792640 | consumed tokens: 15959326720 | elapsed time per iteration (s): 0.43 | learning rate: 1.723E-04 | global batch size: 256 | lm loss: 2.382631E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.848 | TFLOPs: 31.58 | 7: iteration 30450/ 115203 | consumed samples: 7795200 | consumed tokens: 15964569600 | elapsed time per iteration (s): 0.45 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.343185E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.748 | TFLOPs: 30.00 | 7: iteration 30460/ 115203 | consumed samples: 7797760 | consumed tokens: 15969812480 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.341725E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.860 | TFLOPs: 31.47 | 7: iteration 30470/ 115203 | consumed samples: 7800320 | consumed tokens: 15975055360 | elapsed time per iteration (s): 0.42 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.376163E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.517 | TFLOPs: 31.77 | 7: iteration 30480/ 115203 | consumed samples: 7802880 | consumed tokens: 15980298240 | elapsed time per iteration (s): 0.43 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.355397E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.305 | TFLOPs: 31.55 | 7: iteration 30490/ 115203 | consumed samples: 7805440 | consumed tokens: 15985541120 | elapsed time per iteration (s): 0.42 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.369816E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.384 | TFLOPs: 32.08 | 7: iteration 30500/ 115203 | consumed samples: 7808000 | consumed tokens: 15990784000 | elapsed time per iteration (s): 0.42 | learning rate: 1.722E-04 | global batch size: 256 | lm loss: 2.343002E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.212 | TFLOPs: 31.91 | 7: iteration 30510/ 115203 | consumed samples: 7810560 | consumed tokens: 15996026880 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.372978E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.014 | TFLOPs: 30.96 | 7: iteration 30520/ 115203 | consumed samples: 7813120 | consumed tokens: 16001269760 | elapsed time per iteration (s): 0.42 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.360504E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.883 | TFLOPs: 31.89 | 7: iteration 30530/ 115203 | consumed samples: 7815680 | consumed tokens: 16006512640 | elapsed time per iteration (s): 0.42 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.350978E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.376 | TFLOPs: 32.18 | 7: iteration 30540/ 115203 | consumed samples: 7818240 | consumed tokens: 16011755520 | elapsed time per iteration (s): 0.43 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.330597E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.734 | TFLOPs: 31.36 | 7: iteration 30550/ 115203 | consumed samples: 7820800 | consumed tokens: 16016998400 | elapsed time per iteration (s): 0.42 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.378823E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.940 | TFLOPs: 32.11 | 7: iteration 30560/ 115203 | consumed samples: 7823360 | consumed tokens: 16022241280 | elapsed time per iteration (s): 0.42 | learning rate: 1.721E-04 | global batch size: 256 | lm loss: 2.379930E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.845 | TFLOPs: 31.95 | 7: iteration 30570/ 115203 | consumed samples: 7825920 | consumed tokens: 16027484160 | elapsed time per iteration (s): 0.44 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 2.398160E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.742 | TFLOPs: 30.47 | 7: iteration 30580/ 115203 | consumed samples: 7828480 | consumed tokens: 16032727040 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 2.322978E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.818 | TFLOPs: 31.31 | 7: iteration 30590/ 115203 | consumed samples: 7831040 | consumed tokens: 16037969920 | elapsed time per iteration (s): 0.42 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 2.372705E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.991 | TFLOPs: 31.90 | 7: iteration 30600/ 115203 | consumed samples: 7833600 | consumed tokens: 16043212800 | elapsed time per iteration (s): 0.43 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 2.368838E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.802 | TFLOPs: 31.10 | 7: iteration 30610/ 115203 | consumed samples: 7836160 | consumed tokens: 16048455680 | elapsed time per iteration (s): 0.44 | learning rate: 1.720E-04 | global batch size: 256 | lm loss: 2.358338E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.608 | TFLOPs: 30.78 | 7: iteration 30620/ 115203 | consumed samples: 7838720 | consumed tokens: 16053698560 | elapsed time per iteration (s): 0.42 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.363690E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.033 | TFLOPs: 31.64 | 7: iteration 30630/ 115203 | consumed samples: 7841280 | consumed tokens: 16058941440 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.341281E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.724 | TFLOPs: 31.20 | 7: iteration 30640/ 115203 | consumed samples: 7843840 | consumed tokens: 16064184320 | elapsed time per iteration (s): 0.42 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.336075E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.515 | TFLOPs: 31.67 | 7: iteration 30650/ 115203 | consumed samples: 7846400 | consumed tokens: 16069427200 | elapsed time per iteration (s): 0.45 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.369621E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.458 | TFLOPs: 29.56 | 7: iteration 30660/ 115203 | consumed samples: 7848960 | consumed tokens: 16074670080 | elapsed time per iteration (s): 0.43 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.362420E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.554 | TFLOPs: 31.25 | 7: iteration 30670/ 115203 | consumed samples: 7851520 | consumed tokens: 16079912960 | elapsed time per iteration (s): 0.44 | learning rate: 1.719E-04 | global batch size: 256 | lm loss: 2.375056E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.567 | TFLOPs: 30.83 | 7: iteration 30680/ 115203 | consumed samples: 7854080 | consumed tokens: 16085155840 | elapsed time per iteration (s): 0.42 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 2.408510E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.890 | TFLOPs: 31.89 | 7: iteration 30690/ 115203 | consumed samples: 7856640 | consumed tokens: 16090398720 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 2.383366E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.484 | TFLOPs: 30.98 | 7: iteration 30700/ 115203 | consumed samples: 7859200 | consumed tokens: 16095641600 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 2.379061E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.807 | TFLOPs: 31.16 | 7: iteration 30710/ 115203 | consumed samples: 7861760 | consumed tokens: 16100884480 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 2.360751E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.016 | TFLOPs: 31.22 | 7: iteration 30720/ 115203 | consumed samples: 7864320 | consumed tokens: 16106127360 | elapsed time per iteration (s): 0.43 | learning rate: 1.718E-04 | global batch size: 256 | lm loss: 2.356531E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.117 | TFLOPs: 31.38 | 7: iteration 30730/ 115203 | consumed samples: 7866880 | consumed tokens: 16111370240 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.363380E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.120 | TFLOPs: 31.59 | 7: iteration 30740/ 115203 | consumed samples: 7869440 | consumed tokens: 16116613120 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.343550E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.323 | TFLOPs: 31.24 | 7: iteration 30750/ 115203 | consumed samples: 7872000 | consumed tokens: 16121856000 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.357615E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.447 | TFLOPs: 31.35 | 7: iteration 30760/ 115203 | consumed samples: 7874560 | consumed tokens: 16127098880 | elapsed time per iteration (s): 0.42 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.354512E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.115 | TFLOPs: 31.91 | 7: iteration 30770/ 115203 | consumed samples: 7877120 | consumed tokens: 16132341760 | elapsed time per iteration (s): 0.43 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.340547E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.697 | TFLOPs: 31.41 | 7: iteration 30780/ 115203 | consumed samples: 7879680 | consumed tokens: 16137584640 | elapsed time per iteration (s): 0.42 | learning rate: 1.717E-04 | global batch size: 256 | lm loss: 2.365979E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.397 | TFLOPs: 31.71 | 7: iteration 30790/ 115203 | consumed samples: 7882240 | consumed tokens: 16142827520 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 2.316064E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.678 | TFLOPs: 31.52 | 7: iteration 30800/ 115203 | consumed samples: 7884800 | consumed tokens: 16148070400 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 2.365242E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.409 | TFLOPs: 31.29 | 7: iteration 30810/ 115203 | consumed samples: 7887360 | consumed tokens: 16153313280 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 2.341257E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.071 | TFLOPs: 31.38 | 7: iteration 30820/ 115203 | consumed samples: 7889920 | consumed tokens: 16158556160 | elapsed time per iteration (s): 0.43 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 2.357706E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.925 | TFLOPs: 31.11 | 7: iteration 30830/ 115203 | consumed samples: 7892480 | consumed tokens: 16163799040 | elapsed time per iteration (s): 0.44 | learning rate: 1.716E-04 | global batch size: 256 | lm loss: 2.343770E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.100 | TFLOPs: 30.70 | 7: iteration 30840/ 115203 | consumed samples: 7895040 | consumed tokens: 16169041920 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.379471E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.790 | TFLOPs: 31.94 | 7: iteration 30850/ 115203 | consumed samples: 7897600 | consumed tokens: 16174284800 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.366095E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.042 | TFLOPs: 31.75 | 7: iteration 30860/ 115203 | consumed samples: 7900160 | consumed tokens: 16179527680 | elapsed time per iteration (s): 0.43 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.353775E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.174 | TFLOPs: 31.44 | 7: iteration 30870/ 115203 | consumed samples: 7902720 | consumed tokens: 16184770560 | elapsed time per iteration (s): 0.43 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.375645E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.187 | TFLOPs: 31.23 | 7: iteration 30880/ 115203 | consumed samples: 7905280 | consumed tokens: 16190013440 | elapsed time per iteration (s): 0.42 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.339651E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.049 | TFLOPs: 31.96 | 7: iteration 30890/ 115203 | consumed samples: 7907840 | consumed tokens: 16195256320 | elapsed time per iteration (s): 0.45 | learning rate: 1.715E-04 | global batch size: 256 | lm loss: 2.361684E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.873 | TFLOPs: 30.11 | 7: iteration 30900/ 115203 | consumed samples: 7910400 | consumed tokens: 16200499200 | elapsed time per iteration (s): 0.43 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 2.344158E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.593 | TFLOPs: 31.41 | 7: iteration 30910/ 115203 | consumed samples: 7912960 | consumed tokens: 16205742080 | elapsed time per iteration (s): 0.43 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 2.345191E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.782 | TFLOPs: 31.57 | 7: iteration 30920/ 115203 | consumed samples: 7915520 | consumed tokens: 16210984960 | elapsed time per iteration (s): 0.42 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 2.369702E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.089 | TFLOPs: 31.75 | 7: iteration 30930/ 115203 | consumed samples: 7918080 | consumed tokens: 16216227840 | elapsed time per iteration (s): 0.42 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 2.353169E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.897 | TFLOPs: 32.00 | 7: iteration 30940/ 115203 | consumed samples: 7920640 | consumed tokens: 16221470720 | elapsed time per iteration (s): 0.42 | learning rate: 1.714E-04 | global batch size: 256 | lm loss: 2.366176E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.387 | TFLOPs: 31.87 | 7: iteration 30950/ 115203 | consumed samples: 7923200 | consumed tokens: 16226713600 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.339012E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.359 | TFLOPs: 31.50 | 7: iteration 30960/ 115203 | consumed samples: 7925760 | consumed tokens: 16231956480 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.377311E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.727 | TFLOPs: 31.36 | 7: iteration 30970/ 115203 | consumed samples: 7928320 | consumed tokens: 16237199360 | elapsed time per iteration (s): 0.42 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.366016E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.080 | TFLOPs: 31.85 | 7: iteration 30980/ 115203 | consumed samples: 7930880 | consumed tokens: 16242442240 | elapsed time per iteration (s): 0.42 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.321914E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.236 | TFLOPs: 31.86 | 7: iteration 30990/ 115203 | consumed samples: 7933440 | consumed tokens: 16247685120 | elapsed time per iteration (s): 0.42 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.355643E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.709 | TFLOPs: 31.62 | 7: iteration 31000/ 115203 | consumed samples: 7936000 | consumed tokens: 16252928000 | elapsed time per iteration (s): 0.43 | learning rate: 1.713E-04 | global batch size: 256 | lm loss: 2.365865E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.910 | TFLOPs: 31.32 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 31000 | lm loss value: 2.240203E+00 | lm loss PPL: 9.395240E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 31000 to checkpoints_221m 0: [2022-11-28 16:40:56,437] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step31000 is begin to save! 0: [2022-11-28 16:40:56,441] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:40:56,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:40:56,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:40:56,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:40:56,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:40:56,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:40:56,584] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:40:56,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:40:56,606] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:40:56,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:40:56,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:40:56,652] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:40:56,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:40:56,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:40:56,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:40:56,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:40:56,699] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:40:56,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:40:56,722] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:40:56,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:40:56,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:40:56,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:40:56,769] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:40:56,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:40:56,791] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:40:56,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:40:56,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:40:56,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:40:56,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:40:56,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:40:56,861] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:40:56,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:40:56,884] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:40:56,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:40:56,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:40:56,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:40:56,931] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:40:56,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:40:56,954] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:40:56,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:40:56,959] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step31000/mp_rank_00_model_states.pt 0: [2022-11-28 16:40:56,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:40:56,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,978] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:40:56,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:40:57,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2022-11-28 16:40:57,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2022-11-28 16:40:57,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:40:57,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:40:57,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 7: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 2: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:40:57,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-28 16:40:57,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 16:40:57,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 6: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 1: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 3: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:40:57,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 16:40:57,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 16:40:57,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 5: [2022-11-28 16:40:57,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:40:57,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:40:57,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 4: [2022-11-28 16:40:57,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:40:57,073] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 16:40:57,073] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: [2022-11-28 16:40:57,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step31000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:40:57,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step31000 is ready now! 0: successfully saved checkpoint at iteration 31000 to checkpoints_221m 7: time (ms) | save-checkpoint: 654.04 7: iteration 31010/ 115203 | consumed samples: 7938560 | consumed tokens: 16258170880 | elapsed time per iteration (s): 0.51 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 2.372782E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 498.723 | TFLOPs: 26.17 | 7: iteration 31020/ 115203 | consumed samples: 7941120 | consumed tokens: 16263413760 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 2.415533E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.091 | TFLOPs: 31.22 | 7: iteration 31030/ 115203 | consumed samples: 7943680 | consumed tokens: 16268656640 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 2.355871E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.213 | TFLOPs: 31.91 | 7: iteration 31040/ 115203 | consumed samples: 7946240 | consumed tokens: 16273899520 | elapsed time per iteration (s): 0.42 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 2.423403E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.215 | TFLOPs: 32.17 | 7: iteration 31050/ 115203 | consumed samples: 7948800 | consumed tokens: 16279142400 | elapsed time per iteration (s): 0.43 | learning rate: 1.712E-04 | global batch size: 256 | lm loss: 2.354011E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.872 | TFLOPs: 31.26 | 7: iteration 31060/ 115203 | consumed samples: 7951360 | consumed tokens: 16284385280 | elapsed time per iteration (s): 0.43 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.347532E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.330 | TFLOPs: 31.50 | 7: iteration 31070/ 115203 | consumed samples: 7953920 | consumed tokens: 16289628160 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.402549E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.898 | TFLOPs: 32.05 | 7: iteration 31080/ 115203 | consumed samples: 7956480 | consumed tokens: 16294871040 | elapsed time per iteration (s): 0.43 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.381815E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.798 | TFLOPs: 31.37 | 7: iteration 31090/ 115203 | consumed samples: 7959040 | consumed tokens: 16300113920 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.380374E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.902 | TFLOPs: 31.90 | 7: iteration 31100/ 115203 | consumed samples: 7961600 | consumed tokens: 16305356800 | elapsed time per iteration (s): 0.43 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.350111E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.232 | TFLOPs: 31.18 | 7: iteration 31110/ 115203 | consumed samples: 7964160 | consumed tokens: 16310599680 | elapsed time per iteration (s): 0.42 | learning rate: 1.711E-04 | global batch size: 256 | lm loss: 2.375799E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.427 | TFLOPs: 31.92 | 7: iteration 31120/ 115203 | consumed samples: 7966720 | consumed tokens: 16315842560 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 2.353016E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.616 | TFLOPs: 31.67 | 7: iteration 31130/ 115203 | consumed samples: 7969280 | consumed tokens: 16321085440 | elapsed time per iteration (s): 0.43 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 2.392299E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.751 | TFLOPs: 31.15 | 7: iteration 31140/ 115203 | consumed samples: 7971840 | consumed tokens: 16326328320 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 2.360926E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.401 | TFLOPs: 31.92 | 7: iteration 31150/ 115203 | consumed samples: 7974400 | consumed tokens: 16331571200 | elapsed time per iteration (s): 0.44 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 2.344951E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.632 | TFLOPs: 30.20 | 7: iteration 31160/ 115203 | consumed samples: 7976960 | consumed tokens: 16336814080 | elapsed time per iteration (s): 0.42 | learning rate: 1.710E-04 | global batch size: 256 | lm loss: 2.352390E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.492 | TFLOPs: 32.03 | 7: iteration 31170/ 115203 | consumed samples: 7979520 | consumed tokens: 16342056960 | elapsed time per iteration (s): 0.43 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.360195E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.416 | TFLOPs: 31.40 | 7: iteration 31180/ 115203 | consumed samples: 7982080 | consumed tokens: 16347299840 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.378267E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.108 | TFLOPs: 31.64 | 7: iteration 31190/ 115203 | consumed samples: 7984640 | consumed tokens: 16352542720 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.351979E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.515 | TFLOPs: 31.77 | 7: iteration 31200/ 115203 | consumed samples: 7987200 | consumed tokens: 16357785600 | elapsed time per iteration (s): 0.44 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.360505E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.372 | TFLOPs: 30.40 | 7: iteration 31210/ 115203 | consumed samples: 7989760 | consumed tokens: 16363028480 | elapsed time per iteration (s): 0.42 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.348680E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.519 | TFLOPs: 31.82 | 7: iteration 31220/ 115203 | consumed samples: 7992320 | consumed tokens: 16368271360 | elapsed time per iteration (s): 0.44 | learning rate: 1.709E-04 | global batch size: 256 | lm loss: 2.375292E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.070 | TFLOPs: 30.54 | 7: iteration 31230/ 115203 | consumed samples: 7994880 | consumed tokens: 16373514240 | elapsed time per iteration (s): 0.44 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 2.357637E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.866 | TFLOPs: 30.74 | 7: iteration 31240/ 115203 | consumed samples: 7997440 | consumed tokens: 16378757120 | elapsed time per iteration (s): 0.43 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 2.378298E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.227 | TFLOPs: 31.60 | 7: iteration 31250/ 115203 | consumed samples: 8000000 | consumed tokens: 16384000000 | elapsed time per iteration (s): 0.43 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 2.351033E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.325 | TFLOPs: 31.55 | 7: iteration 31260/ 115203 | consumed samples: 8002560 | consumed tokens: 16389242880 | elapsed time per iteration (s): 0.42 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 2.330231E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.374 | TFLOPs: 31.87 | 7: iteration 31270/ 115203 | consumed samples: 8005120 | consumed tokens: 16394485760 | elapsed time per iteration (s): 0.43 | learning rate: 1.708E-04 | global batch size: 256 | lm loss: 2.329446E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.470 | TFLOPs: 31.03 | 7: iteration 31280/ 115203 | consumed samples: 8007680 | consumed tokens: 16399728640 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.338628E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.530 | TFLOPs: 31.56 | 7: iteration 31290/ 115203 | consumed samples: 8010240 | consumed tokens: 16404971520 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.332691E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.577 | TFLOPs: 31.46 | 7: iteration 31300/ 115203 | consumed samples: 8012800 | consumed tokens: 16410214400 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.364546E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.213 | TFLOPs: 31.81 | 7: iteration 31310/ 115203 | consumed samples: 8015360 | consumed tokens: 16415457280 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.320795E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.817 | TFLOPs: 31.68 | 7: iteration 31320/ 115203 | consumed samples: 8017920 | consumed tokens: 16420700160 | elapsed time per iteration (s): 0.42 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.378383E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.766 | TFLOPs: 31.94 | 7: iteration 31330/ 115203 | consumed samples: 8020480 | consumed tokens: 16425943040 | elapsed time per iteration (s): 0.43 | learning rate: 1.707E-04 | global batch size: 256 | lm loss: 2.341583E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.228 | TFLOPs: 31.60 | 7: iteration 31340/ 115203 | consumed samples: 8023040 | consumed tokens: 16431185920 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 2.383834E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.862 | TFLOPs: 31.58 | 7: iteration 31350/ 115203 | consumed samples: 8025600 | consumed tokens: 16436428800 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 2.344883E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.702 | TFLOPs: 31.52 | 7: iteration 31360/ 115203 | consumed samples: 8028160 | consumed tokens: 16441671680 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 2.382579E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.965 | TFLOPs: 30.90 | 7: iteration 31370/ 115203 | consumed samples: 8030720 | consumed tokens: 16446914560 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 2.342684E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.945 | TFLOPs: 31.06 | 7: iteration 31380/ 115203 | consumed samples: 8033280 | consumed tokens: 16452157440 | elapsed time per iteration (s): 0.43 | learning rate: 1.706E-04 | global batch size: 256 | lm loss: 2.340261E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.390 | TFLOPs: 30.98 | 7: iteration 31390/ 115203 | consumed samples: 8035840 | consumed tokens: 16457400320 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.377701E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.314 | TFLOPs: 31.18 | 7: iteration 31400/ 115203 | consumed samples: 8038400 | consumed tokens: 16462643200 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.350722E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.455 | TFLOPs: 31.09 | 7: iteration 31410/ 115203 | consumed samples: 8040960 | consumed tokens: 16467886080 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.356071E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.420 | TFLOPs: 31.45 | 7: iteration 31420/ 115203 | consumed samples: 8043520 | consumed tokens: 16473128960 | elapsed time per iteration (s): 0.42 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.339895E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.693 | TFLOPs: 31.73 | 7: iteration 31430/ 115203 | consumed samples: 8046080 | consumed tokens: 16478371840 | elapsed time per iteration (s): 0.43 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.379909E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.259 | TFLOPs: 31.07 | 7: iteration 31440/ 115203 | consumed samples: 8048640 | consumed tokens: 16483614720 | elapsed time per iteration (s): 0.42 | learning rate: 1.705E-04 | global batch size: 256 | lm loss: 2.362523E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.037 | TFLOPs: 32.01 | 7: iteration 31450/ 115203 | consumed samples: 8051200 | consumed tokens: 16488857600 | elapsed time per iteration (s): 0.43 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 2.337658E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.170 | TFLOPs: 31.59 | 7: iteration 31460/ 115203 | consumed samples: 8053760 | consumed tokens: 16494100480 | elapsed time per iteration (s): 0.42 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 2.387510E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.456 | TFLOPs: 31.66 | 7: iteration 31470/ 115203 | consumed samples: 8056320 | consumed tokens: 16499343360 | elapsed time per iteration (s): 0.42 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 2.403867E+00 | grad norm: 1.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.611 | TFLOPs: 31.99 | 7: iteration 31480/ 115203 | consumed samples: 8058880 | consumed tokens: 16504586240 | elapsed time per iteration (s): 0.42 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 2.451954E+00 | grad norm: 0.862 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.466 | TFLOPs: 31.93 | 7: iteration 31490/ 115203 | consumed samples: 8061440 | consumed tokens: 16509829120 | elapsed time per iteration (s): 0.42 | learning rate: 1.704E-04 | global batch size: 256 | lm loss: 2.408828E+00 | grad norm: 0.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.502 | TFLOPs: 31.66 | 7: iteration 31500/ 115203 | consumed samples: 8064000 | consumed tokens: 16515072000 | elapsed time per iteration (s): 0.42 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.398046E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.213 | TFLOPs: 31.70 | 7: iteration 31510/ 115203 | consumed samples: 8066560 | consumed tokens: 16520314880 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.423939E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.060 | TFLOPs: 31.59 | 7: iteration 31520/ 115203 | consumed samples: 8069120 | consumed tokens: 16525557760 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.331270E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.533 | TFLOPs: 31.14 | 7: iteration 31530/ 115203 | consumed samples: 8071680 | consumed tokens: 16530800640 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.344701E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.618 | TFLOPs: 31.57 | 7: iteration 31540/ 115203 | consumed samples: 8074240 | consumed tokens: 16536043520 | elapsed time per iteration (s): 0.42 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.364483E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.015 | TFLOPs: 31.74 | 7: iteration 31550/ 115203 | consumed samples: 8076800 | consumed tokens: 16541286400 | elapsed time per iteration (s): 0.43 | learning rate: 1.703E-04 | global batch size: 256 | lm loss: 2.368815E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.922 | TFLOPs: 31.58 | 7: iteration 31560/ 115203 | consumed samples: 8079360 | consumed tokens: 16546529280 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 2.370428E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.144 | TFLOPs: 31.59 | 7: iteration 31570/ 115203 | consumed samples: 8081920 | consumed tokens: 16551772160 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 2.352028E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.903 | TFLOPs: 31.00 | 7: iteration 31580/ 115203 | consumed samples: 8084480 | consumed tokens: 16557015040 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 2.370259E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.327 | TFLOPs: 31.50 | 7: iteration 31590/ 115203 | consumed samples: 8087040 | consumed tokens: 16562257920 | elapsed time per iteration (s): 0.44 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 2.346528E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.076 | TFLOPs: 30.70 | 7: iteration 31600/ 115203 | consumed samples: 8089600 | consumed tokens: 16567500800 | elapsed time per iteration (s): 0.43 | learning rate: 1.702E-04 | global batch size: 256 | lm loss: 2.340038E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.149 | TFLOPs: 31.59 | 7: iteration 31610/ 115203 | consumed samples: 8092160 | consumed tokens: 16572743680 | elapsed time per iteration (s): 0.43 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 2.341713E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.065 | TFLOPs: 31.59 | 7: iteration 31620/ 115203 | consumed samples: 8094720 | consumed tokens: 16577986560 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 2.371593E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.668 | TFLOPs: 31.88 | 7: iteration 31630/ 115203 | consumed samples: 8097280 | consumed tokens: 16583229440 | elapsed time per iteration (s): 0.44 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 2.362031E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.601 | TFLOPs: 30.78 | 7: iteration 31640/ 115203 | consumed samples: 8099840 | consumed tokens: 16588472320 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 2.364382E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.853 | TFLOPs: 31.74 | 7: iteration 31650/ 115203 | consumed samples: 8102400 | consumed tokens: 16593715200 | elapsed time per iteration (s): 0.42 | learning rate: 1.701E-04 | global batch size: 256 | lm loss: 2.360441E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.577 | TFLOPs: 31.88 | 7: iteration 31660/ 115203 | consumed samples: 8104960 | consumed tokens: 16598958080 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.344153E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.858 | TFLOPs: 31.26 | 7: iteration 31670/ 115203 | consumed samples: 8107520 | consumed tokens: 16604200960 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.367951E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.466 | TFLOPs: 31.30 | 7: iteration 31680/ 115203 | consumed samples: 8110080 | consumed tokens: 16609443840 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.346277E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.723 | TFLOPs: 31.47 | 7: iteration 31690/ 115203 | consumed samples: 8112640 | consumed tokens: 16614686720 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.373652E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.463 | TFLOPs: 31.77 | 7: iteration 31700/ 115203 | consumed samples: 8115200 | consumed tokens: 16619929600 | elapsed time per iteration (s): 0.42 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.336032E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.424 | TFLOPs: 31.82 | 7: iteration 31710/ 115203 | consumed samples: 8117760 | consumed tokens: 16625172480 | elapsed time per iteration (s): 0.43 | learning rate: 1.700E-04 | global batch size: 256 | lm loss: 2.355918E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.714 | TFLOPs: 31.57 | 7: iteration 31720/ 115203 | consumed samples: 8120320 | consumed tokens: 16630415360 | elapsed time per iteration (s): 0.44 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 2.387515E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.953 | TFLOPs: 30.53 | 7: iteration 31730/ 115203 | consumed samples: 8122880 | consumed tokens: 16635658240 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 2.378996E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.026 | TFLOPs: 32.01 | 7: iteration 31740/ 115203 | consumed samples: 8125440 | consumed tokens: 16640901120 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 2.375915E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.300 | TFLOPs: 31.81 | 7: iteration 31750/ 115203 | consumed samples: 8128000 | consumed tokens: 16646144000 | elapsed time per iteration (s): 0.43 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 2.347108E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.839 | TFLOPs: 31.58 | 7: iteration 31760/ 115203 | consumed samples: 8130560 | consumed tokens: 16651386880 | elapsed time per iteration (s): 0.42 | learning rate: 1.699E-04 | global batch size: 256 | lm loss: 2.386427E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.900 | TFLOPs: 31.84 | 7: iteration 31770/ 115203 | consumed samples: 8133120 | consumed tokens: 16656629760 | elapsed time per iteration (s): 0.43 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.351119E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.746 | TFLOPs: 30.89 | 7: iteration 31780/ 115203 | consumed samples: 8135680 | consumed tokens: 16661872640 | elapsed time per iteration (s): 0.43 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.366237E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.280 | TFLOPs: 31.39 | 7: iteration 31790/ 115203 | consumed samples: 8138240 | consumed tokens: 16667115520 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.366561E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.142 | TFLOPs: 32.07 | 7: iteration 31800/ 115203 | consumed samples: 8140800 | consumed tokens: 16672358400 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.364484E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.323 | TFLOPs: 31.87 | 7: iteration 31810/ 115203 | consumed samples: 8143360 | consumed tokens: 16677601280 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.352308E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.169 | TFLOPs: 31.96 | 7: iteration 31820/ 115203 | consumed samples: 8145920 | consumed tokens: 16682844160 | elapsed time per iteration (s): 0.42 | learning rate: 1.698E-04 | global batch size: 256 | lm loss: 2.360623E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.358 | TFLOPs: 31.71 | 7: iteration 31830/ 115203 | consumed samples: 8148480 | consumed tokens: 16688087040 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 2.386655E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.498 | TFLOPs: 31.82 | 7: iteration 31840/ 115203 | consumed samples: 8151040 | consumed tokens: 16693329920 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 2.351891E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.072 | TFLOPs: 31.85 | 7: iteration 31850/ 115203 | consumed samples: 8153600 | consumed tokens: 16698572800 | elapsed time per iteration (s): 0.43 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 2.361053E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.645 | TFLOPs: 31.57 | 7: iteration 31860/ 115203 | consumed samples: 8156160 | consumed tokens: 16703815680 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 2.325130E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.598 | TFLOPs: 31.93 | 7: iteration 31870/ 115203 | consumed samples: 8158720 | consumed tokens: 16709058560 | elapsed time per iteration (s): 0.42 | learning rate: 1.697E-04 | global batch size: 256 | lm loss: 2.353976E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.988 | TFLOPs: 31.74 | 7: iteration 31880/ 115203 | consumed samples: 8161280 | consumed tokens: 16714301440 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 2.349787E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.474 | TFLOPs: 31.93 | 7: iteration 31890/ 115203 | consumed samples: 8163840 | consumed tokens: 16719544320 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 2.324540E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.464 | TFLOPs: 31.66 | 7: iteration 31900/ 115203 | consumed samples: 8166400 | consumed tokens: 16724787200 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 2.384025E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.376 | TFLOPs: 31.61 | 7: iteration 31910/ 115203 | consumed samples: 8168960 | consumed tokens: 16730030080 | elapsed time per iteration (s): 0.42 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 2.365805E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.351 | TFLOPs: 31.97 | 7: iteration 31920/ 115203 | consumed samples: 8171520 | consumed tokens: 16735272960 | elapsed time per iteration (s): 0.43 | learning rate: 1.696E-04 | global batch size: 256 | lm loss: 2.326947E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.084 | TFLOPs: 31.59 | 7: iteration 31930/ 115203 | consumed samples: 8174080 | consumed tokens: 16740515840 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.355523E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.944 | TFLOPs: 31.85 | 7: iteration 31940/ 115203 | consumed samples: 8176640 | consumed tokens: 16745758720 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.371057E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.077 | TFLOPs: 32.06 | 7: iteration 31950/ 115203 | consumed samples: 8179200 | consumed tokens: 16751001600 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.338163E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.460 | TFLOPs: 31.45 | 7: iteration 31960/ 115203 | consumed samples: 8181760 | consumed tokens: 16756244480 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.322041E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.762 | TFLOPs: 31.94 | 7: iteration 31970/ 115203 | consumed samples: 8184320 | consumed tokens: 16761487360 | elapsed time per iteration (s): 0.42 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.375355E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.002 | TFLOPs: 32.22 | 7: iteration 31980/ 115203 | consumed samples: 8186880 | consumed tokens: 16766730240 | elapsed time per iteration (s): 0.43 | learning rate: 1.695E-04 | global batch size: 256 | lm loss: 2.353720E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.807 | TFLOPs: 31.37 | 7: iteration 31990/ 115203 | consumed samples: 8189440 | consumed tokens: 16771973120 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 2.356504E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.595 | TFLOPs: 31.72 | 0: [2022-11-28 16:48:02,934] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=0, lr=[0.00016941764143236279, 0.00016941764143236279, 0.00016941764143236279], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 32000/ 115203 | consumed samples: 8192000 | consumed tokens: 16777216000 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 2.332176E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.330 | TFLOPs: 31.71 | 0: steps: 32000 loss: 2.2570 iter time (s): 0.425 samples/sec: 602.748 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 32000 | lm loss value: 2.202039E+00 | lm loss PPL: 9.043436E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 32000 to checkpoints_221m 0: [2022-11-28 16:48:03,103] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step32000 is begin to save! 0: [2022-11-28 16:48:03,108] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:48:03,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:48:03,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:48:03,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:48:03,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:48:03,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:48:03,312] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:48:03,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:48:03,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:48:03,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:48:03,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:48:03,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:48:03,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:48:03,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:48:03,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:48:03,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:48:03,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:48:03,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:48:03,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:48:03,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:48:03,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:48:03,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:48:03,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:48:03,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:48:03,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:48:03,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:48:03,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:48:03,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:48:03,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:48:03,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:48:03,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:48:03,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:48:03,723] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:48:03,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:48:03,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:48:03,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:48:03,785] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:48:03,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:48:03,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:48:03,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:48:03,849] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step32000/mp_rank_00_model_states.pt 0: [2022-11-28 16:48:03,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:48:03,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:48:04,086] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:48:04,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,137] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,137] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 2: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 3: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 1: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:48:04,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:48:04,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:48:04,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:48:04,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 7: [2022-11-28 16:48:04,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:48:04,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:48:04,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 16:48:04,152] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:48:04,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 5: [2022-11-28 16:48:04,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:48:04,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:48:04,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:48:04,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 4: [2022-11-28 16:48:04,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,156] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 6: [2022-11-28 16:48:04,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:48:04,156] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:48:04,157] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: [2022-11-28 16:48:04,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step32000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:48:04,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step32000 is ready now! 0: successfully saved checkpoint at iteration 32000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1097.33 7: iteration 32010/ 115203 | consumed samples: 8194560 | consumed tokens: 16782458880 | elapsed time per iteration (s): 0.55 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 2.354149E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 461.715 | TFLOPs: 24.23 | 7: iteration 32020/ 115203 | consumed samples: 8197120 | consumed tokens: 16787701760 | elapsed time per iteration (s): 0.42 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 2.332352E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.124 | TFLOPs: 31.75 | 7: iteration 32030/ 115203 | consumed samples: 8199680 | consumed tokens: 16792944640 | elapsed time per iteration (s): 0.43 | learning rate: 1.694E-04 | global batch size: 256 | lm loss: 2.358415E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.576 | TFLOPs: 30.99 | 7: iteration 32040/ 115203 | consumed samples: 8202240 | consumed tokens: 16798187520 | elapsed time per iteration (s): 0.44 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 2.384275E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.398 | TFLOPs: 30.82 | 7: iteration 32050/ 115203 | consumed samples: 8204800 | consumed tokens: 16803430400 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 2.385629E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.720 | TFLOPs: 31.78 | 7: iteration 32060/ 115203 | consumed samples: 8207360 | consumed tokens: 16808673280 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 2.345538E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.877 | TFLOPs: 31.89 | 7: iteration 32070/ 115203 | consumed samples: 8209920 | consumed tokens: 16813916160 | elapsed time per iteration (s): 0.42 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 2.356254E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.261 | TFLOPs: 31.97 | 7: iteration 32080/ 115203 | consumed samples: 8212480 | consumed tokens: 16819159040 | elapsed time per iteration (s): 0.43 | learning rate: 1.693E-04 | global batch size: 256 | lm loss: 2.379910E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.591 | TFLOPs: 31.46 | 7: iteration 32090/ 115203 | consumed samples: 8215040 | consumed tokens: 16824401920 | elapsed time per iteration (s): 0.42 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.376979E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.534 | TFLOPs: 31.67 | 7: iteration 32100/ 115203 | consumed samples: 8217600 | consumed tokens: 16829644800 | elapsed time per iteration (s): 0.42 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.364925E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.307 | TFLOPs: 31.65 | 7: iteration 32110/ 115203 | consumed samples: 8220160 | consumed tokens: 16834887680 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.355512E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.625 | TFLOPs: 31.57 | 7: iteration 32120/ 115203 | consumed samples: 8222720 | consumed tokens: 16840130560 | elapsed time per iteration (s): 0.42 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.351813E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.048 | TFLOPs: 31.64 | 7: iteration 32130/ 115203 | consumed samples: 8225280 | consumed tokens: 16845373440 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.399849E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.181 | TFLOPs: 31.28 | 7: iteration 32140/ 115203 | consumed samples: 8227840 | consumed tokens: 16850616320 | elapsed time per iteration (s): 0.43 | learning rate: 1.692E-04 | global batch size: 256 | lm loss: 2.315278E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.069 | TFLOPs: 31.48 | 7: iteration 32150/ 115203 | consumed samples: 8230400 | consumed tokens: 16855859200 | elapsed time per iteration (s): 0.42 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 2.341645E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.096 | TFLOPs: 31.70 | 7: iteration 32160/ 115203 | consumed samples: 8232960 | consumed tokens: 16861102080 | elapsed time per iteration (s): 0.42 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 2.379295E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.777 | TFLOPs: 31.99 | 7: iteration 32170/ 115203 | consumed samples: 8235520 | consumed tokens: 16866344960 | elapsed time per iteration (s): 0.42 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 2.341534E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.916 | TFLOPs: 32.00 | 7: iteration 32180/ 115203 | consumed samples: 8238080 | consumed tokens: 16871587840 | elapsed time per iteration (s): 0.42 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 2.349627E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.821 | TFLOPs: 32.00 | 7: iteration 32190/ 115203 | consumed samples: 8240640 | consumed tokens: 16876830720 | elapsed time per iteration (s): 0.42 | learning rate: 1.691E-04 | global batch size: 256 | lm loss: 2.350404E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.791 | TFLOPs: 31.63 | 7: iteration 32200/ 115203 | consumed samples: 8243200 | consumed tokens: 16882073600 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.372109E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.116 | TFLOPs: 31.80 | 7: iteration 32210/ 115203 | consumed samples: 8245760 | consumed tokens: 16887316480 | elapsed time per iteration (s): 0.43 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.358273E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.486 | TFLOPs: 31.51 | 7: iteration 32220/ 115203 | consumed samples: 8248320 | consumed tokens: 16892559360 | elapsed time per iteration (s): 0.43 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.345670E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.812 | TFLOPs: 31.37 | 7: iteration 32230/ 115203 | consumed samples: 8250880 | consumed tokens: 16897802240 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.337376E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.928 | TFLOPs: 31.74 | 7: iteration 32240/ 115203 | consumed samples: 8253440 | consumed tokens: 16903045120 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.371813E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.031 | TFLOPs: 31.85 | 7: iteration 32250/ 115203 | consumed samples: 8256000 | consumed tokens: 16908288000 | elapsed time per iteration (s): 0.42 | learning rate: 1.690E-04 | global batch size: 256 | lm loss: 2.360011E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.526 | TFLOPs: 31.67 | 7: iteration 32260/ 115203 | consumed samples: 8258560 | consumed tokens: 16913530880 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 2.435407E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.217 | TFLOPs: 31.54 | 7: iteration 32270/ 115203 | consumed samples: 8261120 | consumed tokens: 16918773760 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 2.363619E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.886 | TFLOPs: 31.89 | 7: iteration 32280/ 115203 | consumed samples: 8263680 | consumed tokens: 16924016640 | elapsed time per iteration (s): 0.43 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 2.339692E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.552 | TFLOPs: 31.41 | 7: iteration 32290/ 115203 | consumed samples: 8266240 | consumed tokens: 16929259520 | elapsed time per iteration (s): 0.42 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 2.372890E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.764 | TFLOPs: 31.63 | 7: iteration 32300/ 115203 | consumed samples: 8268800 | consumed tokens: 16934502400 | elapsed time per iteration (s): 0.44 | learning rate: 1.689E-04 | global batch size: 256 | lm loss: 2.356336E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.747 | TFLOPs: 30.63 | 7: iteration 32310/ 115203 | consumed samples: 8271360 | consumed tokens: 16939745280 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 2.360131E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.121 | TFLOPs: 31.59 | 7: iteration 32320/ 115203 | consumed samples: 8273920 | consumed tokens: 16944988160 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 2.382007E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.674 | TFLOPs: 31.52 | 7: iteration 32330/ 115203 | consumed samples: 8276480 | consumed tokens: 16950231040 | elapsed time per iteration (s): 0.43 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 2.375764E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.984 | TFLOPs: 31.11 | 7: iteration 32340/ 115203 | consumed samples: 8279040 | consumed tokens: 16955473920 | elapsed time per iteration (s): 0.42 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 2.380288E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.944 | TFLOPs: 31.85 | 7: iteration 32350/ 115203 | consumed samples: 8281600 | consumed tokens: 16960716800 | elapsed time per iteration (s): 0.42 | learning rate: 1.688E-04 | global batch size: 256 | lm loss: 2.376790E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.217 | TFLOPs: 32.17 | 7: iteration 32360/ 115203 | consumed samples: 8284160 | consumed tokens: 16965959680 | elapsed time per iteration (s): 0.43 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.348744E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.012 | TFLOPs: 31.53 | 7: iteration 32370/ 115203 | consumed samples: 8286720 | consumed tokens: 16971202560 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.354222E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.172 | TFLOPs: 31.65 | 7: iteration 32380/ 115203 | consumed samples: 8289280 | consumed tokens: 16976445440 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.347845E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.427 | TFLOPs: 31.77 | 7: iteration 32390/ 115203 | consumed samples: 8291840 | consumed tokens: 16981688320 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.352870E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.031 | TFLOPs: 31.74 | 7: iteration 32400/ 115203 | consumed samples: 8294400 | consumed tokens: 16986931200 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.338973E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.312 | TFLOPs: 32.18 | 7: iteration 32410/ 115203 | consumed samples: 8296960 | consumed tokens: 16992174080 | elapsed time per iteration (s): 0.42 | learning rate: 1.687E-04 | global batch size: 256 | lm loss: 2.346578E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.556 | TFLOPs: 31.72 | 7: iteration 32420/ 115203 | consumed samples: 8299520 | consumed tokens: 16997416960 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 2.364001E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.786 | TFLOPs: 31.21 | 7: iteration 32430/ 115203 | consumed samples: 8302080 | consumed tokens: 17002659840 | elapsed time per iteration (s): 0.44 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 2.350305E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.717 | TFLOPs: 30.47 | 7: iteration 32440/ 115203 | consumed samples: 8304640 | consumed tokens: 17007902720 | elapsed time per iteration (s): 0.43 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 2.378419E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.012 | TFLOPs: 31.59 | 7: iteration 32450/ 115203 | consumed samples: 8307200 | consumed tokens: 17013145600 | elapsed time per iteration (s): 0.44 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 2.351616E+00 | grad norm: 0.493 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.372 | TFLOPs: 30.29 | 7: iteration 32460/ 115203 | consumed samples: 8309760 | consumed tokens: 17018388480 | elapsed time per iteration (s): 0.42 | learning rate: 1.686E-04 | global batch size: 256 | lm loss: 2.368367E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.841 | TFLOPs: 31.63 | 7: iteration 32470/ 115203 | consumed samples: 8312320 | consumed tokens: 17023631360 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 2.378879E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.803 | TFLOPs: 31.68 | 7: iteration 32480/ 115203 | consumed samples: 8314880 | consumed tokens: 17028874240 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 2.366014E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.038 | TFLOPs: 32.06 | 7: iteration 32490/ 115203 | consumed samples: 8317440 | consumed tokens: 17034117120 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 2.313821E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.730 | TFLOPs: 31.78 | 7: iteration 32500/ 115203 | consumed samples: 8320000 | consumed tokens: 17039360000 | elapsed time per iteration (s): 0.43 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 2.340183E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.809 | TFLOPs: 31.52 | 7: iteration 32510/ 115203 | consumed samples: 8322560 | consumed tokens: 17044602880 | elapsed time per iteration (s): 0.42 | learning rate: 1.685E-04 | global batch size: 256 | lm loss: 2.338539E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.680 | TFLOPs: 31.88 | 7: iteration 32520/ 115203 | consumed samples: 8325120 | consumed tokens: 17049845760 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 2.356737E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.908 | TFLOPs: 31.69 | 7: iteration 32530/ 115203 | consumed samples: 8327680 | consumed tokens: 17055088640 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 2.342931E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.755 | TFLOPs: 31.99 | 7: iteration 32540/ 115203 | consumed samples: 8330240 | consumed tokens: 17060331520 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 2.344279E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.546 | TFLOPs: 31.88 | 7: iteration 32550/ 115203 | consumed samples: 8332800 | consumed tokens: 17065574400 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 2.380073E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.096 | TFLOPs: 31.85 | 7: iteration 32560/ 115203 | consumed samples: 8335360 | consumed tokens: 17070817280 | elapsed time per iteration (s): 0.42 | learning rate: 1.684E-04 | global batch size: 256 | lm loss: 2.360197E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.732 | TFLOPs: 31.99 | 7: iteration 32570/ 115203 | consumed samples: 8337920 | consumed tokens: 17076060160 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.343715E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.316 | TFLOPs: 31.29 | 7: iteration 32580/ 115203 | consumed samples: 8340480 | consumed tokens: 17081303040 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.338499E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.601 | TFLOPs: 31.98 | 7: iteration 32590/ 115203 | consumed samples: 8343040 | consumed tokens: 17086545920 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.361483E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.951 | TFLOPs: 32.00 | 7: iteration 32600/ 115203 | consumed samples: 8345600 | consumed tokens: 17091788800 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.348552E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.126 | TFLOPs: 31.44 | 7: iteration 32610/ 115203 | consumed samples: 8348160 | consumed tokens: 17097031680 | elapsed time per iteration (s): 0.43 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.317225E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.331 | TFLOPs: 31.13 | 7: iteration 32620/ 115203 | consumed samples: 8350720 | consumed tokens: 17102274560 | elapsed time per iteration (s): 0.42 | learning rate: 1.683E-04 | global batch size: 256 | lm loss: 2.358146E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.536 | TFLOPs: 32.19 | 7: iteration 32630/ 115203 | consumed samples: 8353280 | consumed tokens: 17107517440 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 2.338795E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.240 | TFLOPs: 31.60 | 7: iteration 32640/ 115203 | consumed samples: 8355840 | consumed tokens: 17112760320 | elapsed time per iteration (s): 0.44 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 2.337729E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.773 | TFLOPs: 30.79 | 7: iteration 32650/ 115203 | consumed samples: 8358400 | consumed tokens: 17118003200 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 2.318455E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.311 | TFLOPs: 31.18 | 7: iteration 32660/ 115203 | consumed samples: 8360960 | consumed tokens: 17123246080 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 2.317472E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.668 | TFLOPs: 30.94 | 7: iteration 32670/ 115203 | consumed samples: 8363520 | consumed tokens: 17128488960 | elapsed time per iteration (s): 0.43 | learning rate: 1.682E-04 | global batch size: 256 | lm loss: 2.363120E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.643 | TFLOPs: 31.41 | 7: iteration 32680/ 115203 | consumed samples: 8366080 | consumed tokens: 17133731840 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 2.358683E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.920 | TFLOPs: 31.63 | 7: iteration 32690/ 115203 | consumed samples: 8368640 | consumed tokens: 17138974720 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 2.345241E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.953 | TFLOPs: 32.11 | 7: iteration 32700/ 115203 | consumed samples: 8371200 | consumed tokens: 17144217600 | elapsed time per iteration (s): 0.43 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 2.343551E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.845 | TFLOPs: 31.47 | 7: iteration 32710/ 115203 | consumed samples: 8373760 | consumed tokens: 17149460480 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 2.353257E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.770 | TFLOPs: 32.20 | 7: iteration 32720/ 115203 | consumed samples: 8376320 | consumed tokens: 17154703360 | elapsed time per iteration (s): 0.42 | learning rate: 1.681E-04 | global batch size: 256 | lm loss: 2.331727E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.616 | TFLOPs: 31.99 | 7: iteration 32730/ 115203 | consumed samples: 8378880 | consumed tokens: 17159946240 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.360720E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.086 | TFLOPs: 31.01 | 7: iteration 32740/ 115203 | consumed samples: 8381440 | consumed tokens: 17165189120 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.339202E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.346 | TFLOPs: 31.34 | 7: iteration 32750/ 115203 | consumed samples: 8384000 | consumed tokens: 17170432000 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.365931E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.405 | TFLOPs: 31.97 | 7: iteration 32760/ 115203 | consumed samples: 8386560 | consumed tokens: 17175674880 | elapsed time per iteration (s): 0.42 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.353349E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.554 | TFLOPs: 31.77 | 7: iteration 32770/ 115203 | consumed samples: 8389120 | consumed tokens: 17180917760 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.363836E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.081 | TFLOPs: 31.59 | 7: iteration 32780/ 115203 | consumed samples: 8391680 | consumed tokens: 17186160640 | elapsed time per iteration (s): 0.43 | learning rate: 1.680E-04 | global batch size: 256 | lm loss: 2.336279E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.841 | TFLOPs: 30.95 | 7: iteration 32790/ 115203 | consumed samples: 8394240 | consumed tokens: 17191403520 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 2.359840E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.378 | TFLOPs: 32.03 | 7: iteration 32800/ 115203 | consumed samples: 8396800 | consumed tokens: 17196646400 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 2.346659E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.208 | TFLOPs: 31.23 | 7: iteration 32810/ 115203 | consumed samples: 8399360 | consumed tokens: 17201889280 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 2.367901E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.304 | TFLOPs: 31.60 | 7: iteration 32820/ 115203 | consumed samples: 8401920 | consumed tokens: 17207132160 | elapsed time per iteration (s): 0.43 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 2.371101E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.617 | TFLOPs: 31.46 | 7: iteration 32830/ 115203 | consumed samples: 8404480 | consumed tokens: 17212375040 | elapsed time per iteration (s): 0.42 | learning rate: 1.679E-04 | global batch size: 256 | lm loss: 2.392194E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.612 | TFLOPs: 32.09 | 7: iteration 32840/ 115203 | consumed samples: 8407040 | consumed tokens: 17217617920 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 2.367377E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.950 | TFLOPs: 32.06 | 7: iteration 32850/ 115203 | consumed samples: 8409600 | consumed tokens: 17222860800 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 2.333003E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.663 | TFLOPs: 31.88 | 7: iteration 32860/ 115203 | consumed samples: 8412160 | consumed tokens: 17228103680 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 2.364224E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.711 | TFLOPs: 31.83 | 7: iteration 32870/ 115203 | consumed samples: 8414720 | consumed tokens: 17233346560 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 2.342664E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.356 | TFLOPs: 31.92 | 7: iteration 32880/ 115203 | consumed samples: 8417280 | consumed tokens: 17238589440 | elapsed time per iteration (s): 0.42 | learning rate: 1.678E-04 | global batch size: 256 | lm loss: 2.349950E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.413 | TFLOPs: 31.97 | 7: iteration 32890/ 115203 | consumed samples: 8419840 | consumed tokens: 17243832320 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 2.333908E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.900 | TFLOPs: 31.79 | 7: iteration 32900/ 115203 | consumed samples: 8422400 | consumed tokens: 17249075200 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 2.344738E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.524 | TFLOPs: 32.03 | 7: iteration 32910/ 115203 | consumed samples: 8424960 | consumed tokens: 17254318080 | elapsed time per iteration (s): 0.43 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 2.344127E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.647 | TFLOPs: 31.31 | 7: iteration 32920/ 115203 | consumed samples: 8427520 | consumed tokens: 17259560960 | elapsed time per iteration (s): 0.42 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 2.333841E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.505 | TFLOPs: 31.87 | 7: iteration 32930/ 115203 | consumed samples: 8430080 | consumed tokens: 17264803840 | elapsed time per iteration (s): 0.43 | learning rate: 1.677E-04 | global batch size: 256 | lm loss: 2.386547E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.573 | TFLOPs: 31.51 | 7: iteration 32940/ 115203 | consumed samples: 8432640 | consumed tokens: 17270046720 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.380591E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.494 | TFLOPs: 31.56 | 7: iteration 32950/ 115203 | consumed samples: 8435200 | consumed tokens: 17275289600 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.377724E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.085 | TFLOPs: 32.01 | 7: iteration 32960/ 115203 | consumed samples: 8437760 | consumed tokens: 17280532480 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.350878E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | 7: iteration 32970/ 115203 | consumed samples: 8440320 | consumed tokens: 17285775360 | elapsed time per iteration (s): 0.43 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.361757E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.220 | TFLOPs: 31.55 | 7: iteration 32980/ 115203 | consumed samples: 8442880 | consumed tokens: 17291018240 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.381178E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.677 | TFLOPs: 31.73 | 7: iteration 32990/ 115203 | consumed samples: 8445440 | consumed tokens: 17296261120 | elapsed time per iteration (s): 0.42 | learning rate: 1.676E-04 | global batch size: 256 | lm loss: 2.326617E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.721 | TFLOPs: 31.99 | 7: iteration 33000/ 115203 | consumed samples: 8448000 | consumed tokens: 17301504000 | elapsed time per iteration (s): 0.44 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 2.385166E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.412 | TFLOPs: 30.82 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 33000 | lm loss value: 2.297844E+00 | lm loss PPL: 9.952703E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 33000 to checkpoints_221m 0: [2022-11-28 16:55:09,268] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step33000 is begin to save! 0: [2022-11-28 16:55:09,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_01-model_00-model_states.pt... 0: [2022-11-28 16:55:09,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_01-model_00-model_states.pt. 0: [2022-11-28 16:55:09,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_03-model_00-model_states.pt... 0: [2022-11-28 16:55:09,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_03-model_00-model_states.pt. 0: [2022-11-28 16:55:09,391] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_04-model_00-model_states.pt... 0: [2022-11-28 16:55:09,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_04-model_00-model_states.pt. 0: [2022-11-28 16:55:09,414] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_05-model_00-model_states.pt... 0: [2022-11-28 16:55:09,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_05-model_00-model_states.pt. 0: [2022-11-28 16:55:09,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_06-model_00-model_states.pt... 0: [2022-11-28 16:55:09,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_06-model_00-model_states.pt. 0: [2022-11-28 16:55:09,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_07-model_00-model_states.pt... 0: [2022-11-28 16:55:09,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_07-model_00-model_states.pt. 0: [2022-11-28 16:55:09,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_08-model_00-model_states.pt... 0: [2022-11-28 16:55:09,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_08-model_00-model_states.pt. 0: [2022-11-28 16:55:09,508] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_09-model_00-model_states.pt... 0: [2022-11-28 16:55:09,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_09-model_00-model_states.pt. 0: [2022-11-28 16:55:09,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_10-model_00-model_states.pt... 0: [2022-11-28 16:55:09,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_10-model_00-model_states.pt. 0: [2022-11-28 16:55:09,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_11-model_00-model_states.pt... 0: [2022-11-28 16:55:09,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_11-model_00-model_states.pt. 0: [2022-11-28 16:55:09,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_12-model_00-model_states.pt... 0: [2022-11-28 16:55:09,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_12-model_00-model_states.pt. 0: [2022-11-28 16:55:09,603] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_13-model_00-model_states.pt... 0: [2022-11-28 16:55:09,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_13-model_00-model_states.pt. 0: [2022-11-28 16:55:09,627] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_14-model_00-model_states.pt... 0: [2022-11-28 16:55:09,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_14-model_00-model_states.pt. 0: [2022-11-28 16:55:09,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_15-model_00-model_states.pt... 0: [2022-11-28 16:55:09,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_15-model_00-model_states.pt. 0: [2022-11-28 16:55:09,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_16-model_00-model_states.pt... 0: [2022-11-28 16:55:09,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_16-model_00-model_states.pt. 0: [2022-11-28 16:55:09,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_17-model_00-model_states.pt... 0: [2022-11-28 16:55:09,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_17-model_00-model_states.pt. 0: [2022-11-28 16:55:09,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_18-model_00-model_states.pt... 0: [2022-11-28 16:55:09,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_18-model_00-model_states.pt. 0: [2022-11-28 16:55:09,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_19-model_00-model_states.pt... 0: [2022-11-28 16:55:09,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_19-model_00-model_states.pt. 0: [2022-11-28 16:55:09,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_20-model_00-model_states.pt... 0: [2022-11-28 16:55:09,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_20-model_00-model_states.pt. 0: [2022-11-28 16:55:09,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/layer_22-model_00-model_states.pt... 0: [2022-11-28 16:55:09,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/layer_22-model_00-model_states.pt. 0: [2022-11-28 16:55:09,797] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step33000/mp_rank_00_model_states.pt 0: [2022-11-28 16:55:09,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/mp_rank_00_model_states.pt... 0: [2022-11-28 16:55:09,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/mp_rank_00_model_states.pt. 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-28 16:55:09,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 16:55:09,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,869] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2022-11-28 16:55:09,869] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 2: [2022-11-28 16:55:09,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 16:55:09,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 16:55:09,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 7: [2022-11-28 16:55:09,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 16:55:09,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 16:55:09,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 16:55:09,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 5: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 16:55:09,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 16:55:09,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 1: [2022-11-28 16:55:09,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 16:55:09,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 16:55:09,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 16:55:09,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 16:55:09,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 4: [2022-11-28 16:55:09,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 16:55:09,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 16:55:09,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 6: [2022-11-28 16:55:09,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 16:55:09,923] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,924] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: [2022-11-28 16:55:09,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step33000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 16:55:09,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step33000 is ready now! 0: successfully saved checkpoint at iteration 33000 to checkpoints_221m 7: time (ms) | save-checkpoint: 666.83 7: iteration 33010/ 115203 | consumed samples: 8450560 | consumed tokens: 17306746880 | elapsed time per iteration (s): 0.51 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 2.373222E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 502.030 | TFLOPs: 26.34 | 7: iteration 33020/ 115203 | consumed samples: 8453120 | consumed tokens: 17311989760 | elapsed time per iteration (s): 0.43 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 2.370302E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.531 | TFLOPs: 31.46 | 7: iteration 33030/ 115203 | consumed samples: 8455680 | consumed tokens: 17317232640 | elapsed time per iteration (s): 0.43 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 2.362747E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | 7: iteration 33040/ 115203 | consumed samples: 8458240 | consumed tokens: 17322475520 | elapsed time per iteration (s): 0.42 | learning rate: 1.675E-04 | global batch size: 256 | lm loss: 2.328394E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.986 | TFLOPs: 31.69 | 7: iteration 33050/ 115203 | consumed samples: 8460800 | consumed tokens: 17327718400 | elapsed time per iteration (s): 0.43 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 2.320919E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.301 | TFLOPs: 31.23 | 7: iteration 33060/ 115203 | consumed samples: 8463360 | consumed tokens: 17332961280 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 2.340924E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.965 | TFLOPs: 31.74 | 7: iteration 33070/ 115203 | consumed samples: 8465920 | consumed tokens: 17338204160 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 2.348225E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.731 | TFLOPs: 31.83 | 7: iteration 33080/ 115203 | consumed samples: 8468480 | consumed tokens: 17343447040 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 2.314480E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.837 | TFLOPs: 31.63 | 7: iteration 33090/ 115203 | consumed samples: 8471040 | consumed tokens: 17348689920 | elapsed time per iteration (s): 0.42 | learning rate: 1.674E-04 | global batch size: 256 | lm loss: 2.357899E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.762 | TFLOPs: 31.84 | 7: iteration 33100/ 115203 | consumed samples: 8473600 | consumed tokens: 17353932800 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 2.350756E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.026 | TFLOPs: 31.80 | 7: iteration 33110/ 115203 | consumed samples: 8476160 | consumed tokens: 17359175680 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 2.351351E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.742 | TFLOPs: 31.94 | 7: iteration 33120/ 115203 | consumed samples: 8478720 | consumed tokens: 17364418560 | elapsed time per iteration (s): 0.44 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 2.313694E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.781 | TFLOPs: 30.63 | 7: iteration 33130/ 115203 | consumed samples: 8481280 | consumed tokens: 17369661440 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 2.327227E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.054 | TFLOPs: 32.01 | 7: iteration 33140/ 115203 | consumed samples: 8483840 | consumed tokens: 17374904320 | elapsed time per iteration (s): 0.42 | learning rate: 1.673E-04 | global batch size: 256 | lm loss: 2.327046E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.823 | TFLOPs: 31.68 | 7: iteration 33150/ 115203 | consumed samples: 8486400 | consumed tokens: 17380147200 | elapsed time per iteration (s): 0.43 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.341056E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.938 | TFLOPs: 31.16 | 7: iteration 33160/ 115203 | consumed samples: 8488960 | consumed tokens: 17385390080 | elapsed time per iteration (s): 0.43 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.360336E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.875 | TFLOPs: 31.42 | 7: iteration 33170/ 115203 | consumed samples: 8491520 | consumed tokens: 17390632960 | elapsed time per iteration (s): 0.42 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.347735E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.559 | TFLOPs: 31.98 | 7: iteration 33180/ 115203 | consumed samples: 8494080 | consumed tokens: 17395875840 | elapsed time per iteration (s): 0.64 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.353548E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 401.068 | TFLOPs: 21.04 | 7: iteration 33190/ 115203 | consumed samples: 8496640 | consumed tokens: 17401118720 | elapsed time per iteration (s): 0.45 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.349435E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.819 | TFLOPs: 30.05 | 7: iteration 33200/ 115203 | consumed samples: 8499200 | consumed tokens: 17406361600 | elapsed time per iteration (s): 0.43 | learning rate: 1.672E-04 | global batch size: 256 | lm loss: 2.342366E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.317 | TFLOPs: 31.60 | 7: iteration 33210/ 115203 | consumed samples: 8501760 | consumed tokens: 17411604480 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 2.355602E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.244 | TFLOPs: 31.76 | 7: iteration 33220/ 115203 | consumed samples: 8504320 | consumed tokens: 17416847360 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 2.351429E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.099 | TFLOPs: 32.01 | 7: iteration 33230/ 115203 | consumed samples: 8506880 | consumed tokens: 17422090240 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 2.346661E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.626 | TFLOPs: 31.72 | 7: iteration 33240/ 115203 | consumed samples: 8509440 | consumed tokens: 17427333120 | elapsed time per iteration (s): 0.42 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 2.372270E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.603 | TFLOPs: 31.67 | 7: iteration 33250/ 115203 | consumed samples: 8512000 | consumed tokens: 17432576000 | elapsed time per iteration (s): 0.43 | learning rate: 1.671E-04 | global batch size: 256 | lm loss: 2.353701E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.611 | TFLOPs: 31.36 | 7: iteration 33260/ 115203 | consumed samples: 8514560 | consumed tokens: 17437818880 | elapsed time per iteration (s): 0.43 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 2.317867E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.169 | TFLOPs: 31.54 | 7: iteration 33270/ 115203 | consumed samples: 8517120 | consumed tokens: 17443061760 | elapsed time per iteration (s): 0.44 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 2.385357E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.883 | TFLOPs: 30.74 | 7: iteration 33280/ 115203 | consumed samples: 8519680 | consumed tokens: 17448304640 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 2.394553E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.261 | TFLOPs: 31.97 | 7: iteration 33290/ 115203 | consumed samples: 8522240 | consumed tokens: 17453547520 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 2.371549E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.340 | TFLOPs: 31.76 | 7: iteration 33300/ 115203 | consumed samples: 8524800 | consumed tokens: 17458790400 | elapsed time per iteration (s): 0.42 | learning rate: 1.670E-04 | global batch size: 256 | lm loss: 2.324637E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.289 | TFLOPs: 32.18 | 7: iteration 33310/ 115203 | consumed samples: 8527360 | consumed tokens: 17464033280 | elapsed time per iteration (s): 0.44 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 2.393646E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.190 | TFLOPs: 30.44 | 7: iteration 33320/ 115203 | consumed samples: 8529920 | consumed tokens: 17469276160 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 2.383314E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.555 | TFLOPs: 31.56 | 7: iteration 33330/ 115203 | consumed samples: 8532480 | consumed tokens: 17474519040 | elapsed time per iteration (s): 0.43 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 2.352466E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.378 | TFLOPs: 31.03 | 7: iteration 33340/ 115203 | consumed samples: 8535040 | consumed tokens: 17479761920 | elapsed time per iteration (s): 0.42 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 2.360007E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.926 | TFLOPs: 32.16 | 7: iteration 33350/ 115203 | consumed samples: 8537600 | consumed tokens: 17485004800 | elapsed time per iteration (s): 0.44 | learning rate: 1.669E-04 | global batch size: 256 | lm loss: 2.321133E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.072 | TFLOPs: 30.23 | 7: iteration 33360/ 115203 | consumed samples: 8540160 | consumed tokens: 17490247680 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.347092E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.798 | TFLOPs: 31.58 | 7: iteration 33370/ 115203 | consumed samples: 8542720 | consumed tokens: 17495490560 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.352836E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.797 | TFLOPs: 31.37 | 7: iteration 33380/ 115203 | consumed samples: 8545280 | consumed tokens: 17500733440 | elapsed time per iteration (s): 0.43 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.378006E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.108 | TFLOPs: 31.33 | 7: iteration 33390/ 115203 | consumed samples: 8547840 | consumed tokens: 17505976320 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.362744E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.230 | TFLOPs: 31.97 | 7: iteration 33400/ 115203 | consumed samples: 8550400 | consumed tokens: 17511219200 | elapsed time per iteration (s): 0.42 | learning rate: 1.668E-04 | global batch size: 256 | lm loss: 2.356109E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.602 | TFLOPs: 31.62 | 7: iteration 33410/ 115203 | consumed samples: 8552960 | consumed tokens: 17516462080 | elapsed time per iteration (s): 0.43 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.365523E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.676 | TFLOPs: 31.46 | 7: iteration 33420/ 115203 | consumed samples: 8555520 | consumed tokens: 17521704960 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.316045E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | 7: iteration 33430/ 115203 | consumed samples: 8558080 | consumed tokens: 17526947840 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.347613E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.861 | TFLOPs: 31.95 | 7: iteration 33440/ 115203 | consumed samples: 8560640 | consumed tokens: 17532190720 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.326099E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.478 | TFLOPs: 31.72 | 7: iteration 33450/ 115203 | consumed samples: 8563200 | consumed tokens: 17537433600 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.386543E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.301 | TFLOPs: 32.23 | 7: iteration 33460/ 115203 | consumed samples: 8565760 | consumed tokens: 17542676480 | elapsed time per iteration (s): 0.42 | learning rate: 1.667E-04 | global batch size: 256 | lm loss: 2.373695E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.477 | TFLOPs: 31.61 | 7: iteration 33470/ 115203 | consumed samples: 8568320 | consumed tokens: 17547919360 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 2.340462E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.546 | TFLOPs: 31.98 | 7: iteration 33480/ 115203 | consumed samples: 8570880 | consumed tokens: 17553162240 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 2.353104E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.972 | TFLOPs: 31.95 | 7: iteration 33490/ 115203 | consumed samples: 8573440 | consumed tokens: 17558405120 | elapsed time per iteration (s): 0.45 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 2.363393E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.179 | TFLOPs: 29.81 | 7: iteration 33500/ 115203 | consumed samples: 8576000 | consumed tokens: 17563648000 | elapsed time per iteration (s): 0.42 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 2.347606E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.600 | TFLOPs: 32.25 | 7: iteration 33510/ 115203 | consumed samples: 8578560 | consumed tokens: 17568890880 | elapsed time per iteration (s): 0.44 | learning rate: 1.666E-04 | global batch size: 256 | lm loss: 2.350501E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.925 | TFLOPs: 30.43 | 7: iteration 33520/ 115203 | consumed samples: 8581120 | consumed tokens: 17574133760 | elapsed time per iteration (s): 0.44 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 2.337225E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.811 | TFLOPs: 30.63 | 7: iteration 33530/ 115203 | consumed samples: 8583680 | consumed tokens: 17579376640 | elapsed time per iteration (s): 0.43 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 2.343090E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.254 | TFLOPs: 31.55 | 7: iteration 33540/ 115203 | consumed samples: 8586240 | consumed tokens: 17584619520 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 2.352654E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.605 | TFLOPs: 31.62 | 7: iteration 33550/ 115203 | consumed samples: 8588800 | consumed tokens: 17589862400 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 2.377873E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.157 | TFLOPs: 31.86 | 7: iteration 33560/ 115203 | consumed samples: 8591360 | consumed tokens: 17595105280 | elapsed time per iteration (s): 0.42 | learning rate: 1.665E-04 | global batch size: 256 | lm loss: 2.336660E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.139 | TFLOPs: 31.86 | 7: iteration 33570/ 115203 | consumed samples: 8593920 | consumed tokens: 17600348160 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 2.362193E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.959 | TFLOPs: 31.48 | 7: iteration 33580/ 115203 | consumed samples: 8596480 | consumed tokens: 17605591040 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 2.352534E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.934 | TFLOPs: 31.58 | 7: iteration 33590/ 115203 | consumed samples: 8599040 | consumed tokens: 17610833920 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 2.358193E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.862 | TFLOPs: 31.84 | 7: iteration 33600/ 115203 | consumed samples: 8601600 | consumed tokens: 17616076800 | elapsed time per iteration (s): 0.42 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 2.339917E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.721 | TFLOPs: 31.68 | 7: iteration 33610/ 115203 | consumed samples: 8604160 | consumed tokens: 17621319680 | elapsed time per iteration (s): 0.43 | learning rate: 1.664E-04 | global batch size: 256 | lm loss: 2.361568E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.303 | TFLOPs: 31.55 | 7: iteration 33620/ 115203 | consumed samples: 8606720 | consumed tokens: 17626562560 | elapsed time per iteration (s): 0.44 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 2.340048E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.611 | TFLOPs: 30.62 | 7: iteration 33630/ 115203 | consumed samples: 8609280 | consumed tokens: 17631805440 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 2.336117E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.615 | TFLOPs: 31.72 | 7: iteration 33640/ 115203 | consumed samples: 8611840 | consumed tokens: 17637048320 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 2.365158E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.693 | TFLOPs: 31.73 | 7: iteration 33650/ 115203 | consumed samples: 8614400 | consumed tokens: 17642291200 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 2.359393E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.403 | TFLOPs: 31.71 | 7: iteration 33660/ 115203 | consumed samples: 8616960 | consumed tokens: 17647534080 | elapsed time per iteration (s): 0.42 | learning rate: 1.663E-04 | global batch size: 256 | lm loss: 2.354887E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.635 | TFLOPs: 32.25 | 7: iteration 33670/ 115203 | consumed samples: 8619520 | consumed tokens: 17652776960 | elapsed time per iteration (s): 0.44 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 2.331109E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.636 | TFLOPs: 30.83 | 7: iteration 33680/ 115203 | consumed samples: 8622080 | consumed tokens: 17658019840 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 2.342941E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.686 | TFLOPs: 31.94 | 7: iteration 33690/ 115203 | consumed samples: 8624640 | consumed tokens: 17663262720 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 2.341068E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.305 | TFLOPs: 31.71 | 7: iteration 33700/ 115203 | consumed samples: 8627200 | consumed tokens: 17668505600 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 2.371414E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.072 | TFLOPs: 31.80 | 7: iteration 33710/ 115203 | consumed samples: 8629760 | consumed tokens: 17673748480 | elapsed time per iteration (s): 0.42 | learning rate: 1.662E-04 | global batch size: 256 | lm loss: 2.387132E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.901 | TFLOPs: 31.84 | 7: iteration 33720/ 115203 | consumed samples: 8632320 | consumed tokens: 17678991360 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.342018E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.766 | TFLOPs: 32.10 | 7: iteration 33730/ 115203 | consumed samples: 8634880 | consumed tokens: 17684234240 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.371747E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.184 | TFLOPs: 31.96 | 7: iteration 33740/ 115203 | consumed samples: 8637440 | consumed tokens: 17689477120 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.360912E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.140 | TFLOPs: 31.70 | 7: iteration 33750/ 115203 | consumed samples: 8640000 | consumed tokens: 17694720000 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.361433E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.444 | TFLOPs: 31.92 | 7: iteration 33760/ 115203 | consumed samples: 8642560 | consumed tokens: 17699962880 | elapsed time per iteration (s): 0.42 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.396531E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.156 | TFLOPs: 31.80 | 7: iteration 33770/ 115203 | consumed samples: 8645120 | consumed tokens: 17705205760 | elapsed time per iteration (s): 0.43 | learning rate: 1.661E-04 | global batch size: 256 | lm loss: 2.353451E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.381 | TFLOPs: 31.50 | 7: iteration 33780/ 115203 | consumed samples: 8647680 | consumed tokens: 17710448640 | elapsed time per iteration (s): 0.43 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 2.351580E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.994 | TFLOPs: 31.48 | 7: iteration 33790/ 115203 | consumed samples: 8650240 | consumed tokens: 17715691520 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 2.369110E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.408 | TFLOPs: 32.03 | 7: iteration 33800/ 115203 | consumed samples: 8652800 | consumed tokens: 17720934400 | elapsed time per iteration (s): 0.43 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 2.354189E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.010 | TFLOPs: 31.17 | 7: iteration 33810/ 115203 | consumed samples: 8655360 | consumed tokens: 17726177280 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 2.374955E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.085 | TFLOPs: 32.06 | 7: iteration 33820/ 115203 | consumed samples: 8657920 | consumed tokens: 17731420160 | elapsed time per iteration (s): 0.42 | learning rate: 1.660E-04 | global batch size: 256 | lm loss: 2.356673E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.766 | TFLOPs: 32.05 | 7: iteration 33830/ 115203 | consumed samples: 8660480 | consumed tokens: 17736663040 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.373030E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.345 | TFLOPs: 32.02 | 7: iteration 33840/ 115203 | consumed samples: 8663040 | consumed tokens: 17741905920 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.388740E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.568 | TFLOPs: 31.93 | 7: iteration 33850/ 115203 | consumed samples: 8665600 | consumed tokens: 17747148800 | elapsed time per iteration (s): 0.43 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.387722E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.520 | TFLOPs: 31.46 | 7: iteration 33860/ 115203 | consumed samples: 8668160 | consumed tokens: 17752391680 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.329247E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.433 | TFLOPs: 31.82 | 7: iteration 33870/ 115203 | consumed samples: 8670720 | consumed tokens: 17757634560 | elapsed time per iteration (s): 0.42 | learning rate: 1.659E-04 | global batch size: 256 | lm loss: 2.346570E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.377 | TFLOPs: 31.92 | 7: iteration 33880/ 115203 | consumed samples: 8673280 | consumed tokens: 17762877440 | elapsed time per iteration (s): 0.43 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.331670E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.746 | TFLOPs: 31.21 | 7: iteration 33890/ 115203 | consumed samples: 8675840 | consumed tokens: 17768120320 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.341612E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.506 | TFLOPs: 31.72 | 7: iteration 33900/ 115203 | consumed samples: 8678400 | consumed tokens: 17773363200 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.342036E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | 7: iteration 33910/ 115203 | consumed samples: 8680960 | consumed tokens: 17778606080 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.365343E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.342 | TFLOPs: 32.23 | 7: iteration 33920/ 115203 | consumed samples: 8683520 | consumed tokens: 17783848960 | elapsed time per iteration (s): 0.42 | learning rate: 1.658E-04 | global batch size: 256 | lm loss: 2.350107E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.499 | TFLOPs: 32.03 | 7: iteration 33930/ 115203 | consumed samples: 8686080 | consumed tokens: 17789091840 | elapsed time per iteration (s): 0.43 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.324831E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.104 | TFLOPs: 31.28 | 7: iteration 33940/ 115203 | consumed samples: 8688640 | consumed tokens: 17794334720 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.344433E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.099 | TFLOPs: 31.80 | 7: iteration 33950/ 115203 | consumed samples: 8691200 | consumed tokens: 17799577600 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.348979E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.595 | TFLOPs: 31.88 | 7: iteration 33960/ 115203 | consumed samples: 8693760 | consumed tokens: 17804820480 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.351131E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.898 | TFLOPs: 32.11 | 7: iteration 33970/ 115203 | consumed samples: 8696320 | consumed tokens: 17810063360 | elapsed time per iteration (s): 0.42 | learning rate: 1.657E-04 | global batch size: 256 | lm loss: 2.324708E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.520 | TFLOPs: 31.61 | 7: iteration 33980/ 115203 | consumed samples: 8698880 | consumed tokens: 17815306240 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.346988E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.246 | TFLOPs: 31.49 | 7: iteration 33990/ 115203 | consumed samples: 8701440 | consumed tokens: 17820549120 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.354350E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.521 | TFLOPs: 31.46 | 0: [2022-11-28 17:02:17,166] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=0, lr=[0.00016560534437138965, 0.00016560534437138965, 0.00016560534437138965], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 34000/ 115203 | consumed samples: 8704000 | consumed tokens: 17825792000 | elapsed time per iteration (s): 0.43 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.368671E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.512 | TFLOPs: 31.56 | 0: steps: 34000 loss: 2.4198 iter time (s): 0.424 samples/sec: 603.910 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 34000 | lm loss value: 2.268194E+00 | lm loss PPL: 9.661935E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 34000 to checkpoints_221m 0: [2022-11-28 17:02:17,336] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step34000 is begin to save! 0: [2022-11-28 17:02:17,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:02:17,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:02:17,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:02:17,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:02:17,490] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:02:17,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:02:17,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:02:17,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:02:17,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:02:17,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:02:17,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:02:17,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:02:17,587] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:02:17,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:02:17,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:02:17,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:02:17,636] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:02:17,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:02:17,661] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:02:17,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:02:17,685] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:02:17,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:02:17,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:02:17,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:02:17,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:02:17,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:02:17,758] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:02:17,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:02:17,783] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:02:17,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:02:17,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:02:17,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:02:17,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:02:17,856] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:02:17,857] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:02:17,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:02:17,881] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:02:17,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:02:17,904] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:02:17,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:02:17,909] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step34000/mp_rank_00_model_states.pt 0: [2022-11-28 17:02:17,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:02:17,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:02:17,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:02:17,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,983] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,983] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 2: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:02:17,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,986] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:02:17,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:02:17,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 3: [2022-11-28 17:02:17,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:17,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:02:17,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:17,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:02:17,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 4: [2022-11-28 17:02:17,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 5: [2022-11-28 17:02:18,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 6: [2022-11-28 17:02:18,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: [2022-11-28 17:02:18,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:02:18,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:02:18,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 1: [2022-11-28 17:02:18,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 7: [2022-11-28 17:02:18,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:02:18,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step34000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:02:18,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step34000 is ready now! 0: successfully saved checkpoint at iteration 34000 to checkpoints_221m 7: time (ms) | save-checkpoint: 806.75 7: iteration 34010/ 115203 | consumed samples: 8706560 | consumed tokens: 17831034880 | elapsed time per iteration (s): 0.52 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.375567E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 492.970 | TFLOPs: 25.87 | 7: iteration 34020/ 115203 | consumed samples: 8709120 | consumed tokens: 17836277760 | elapsed time per iteration (s): 0.42 | learning rate: 1.656E-04 | global batch size: 256 | lm loss: 2.360644E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.745 | TFLOPs: 31.89 | 7: iteration 34030/ 115203 | consumed samples: 8711680 | consumed tokens: 17841520640 | elapsed time per iteration (s): 0.43 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.350476E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.057 | TFLOPs: 31.59 | 7: iteration 34040/ 115203 | consumed samples: 8714240 | consumed tokens: 17846763520 | elapsed time per iteration (s): 0.42 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.359580E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.362 | TFLOPs: 32.13 | 7: iteration 34050/ 115203 | consumed samples: 8716800 | consumed tokens: 17852006400 | elapsed time per iteration (s): 0.42 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.348703E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.229 | TFLOPs: 32.07 | 7: iteration 34060/ 115203 | consumed samples: 8719360 | consumed tokens: 17857249280 | elapsed time per iteration (s): 0.42 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.317848E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.323 | TFLOPs: 31.97 | 7: iteration 34070/ 115203 | consumed samples: 8721920 | consumed tokens: 17862492160 | elapsed time per iteration (s): 0.43 | learning rate: 1.655E-04 | global batch size: 256 | lm loss: 2.328153E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.024 | TFLOPs: 31.38 | 7: iteration 34080/ 115203 | consumed samples: 8724480 | consumed tokens: 17867735040 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.370301E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.712 | TFLOPs: 31.89 | 7: iteration 34090/ 115203 | consumed samples: 8727040 | consumed tokens: 17872977920 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.368031E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.334 | TFLOPs: 31.76 | 7: iteration 34100/ 115203 | consumed samples: 8729600 | consumed tokens: 17878220800 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.347763E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.020 | TFLOPs: 31.85 | 7: iteration 34110/ 115203 | consumed samples: 8732160 | consumed tokens: 17883463680 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.335181E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.278 | TFLOPs: 32.23 | 7: iteration 34120/ 115203 | consumed samples: 8734720 | consumed tokens: 17888706560 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.352159E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.651 | TFLOPs: 32.25 | 7: iteration 34130/ 115203 | consumed samples: 8737280 | consumed tokens: 17893949440 | elapsed time per iteration (s): 0.42 | learning rate: 1.654E-04 | global batch size: 256 | lm loss: 2.375080E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.988 | TFLOPs: 32.21 | 7: iteration 34140/ 115203 | consumed samples: 8739840 | consumed tokens: 17899192320 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.317279E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.942 | TFLOPs: 32.16 | 7: iteration 34150/ 115203 | consumed samples: 8742400 | consumed tokens: 17904435200 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.344111E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.454 | TFLOPs: 31.82 | 7: iteration 34160/ 115203 | consumed samples: 8744960 | consumed tokens: 17909678080 | elapsed time per iteration (s): 0.42 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.340030E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.294 | TFLOPs: 31.92 | 7: iteration 34170/ 115203 | consumed samples: 8747520 | consumed tokens: 17914920960 | elapsed time per iteration (s): 0.44 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.349299E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.314 | TFLOPs: 30.87 | 7: iteration 34180/ 115203 | consumed samples: 8750080 | consumed tokens: 17920163840 | elapsed time per iteration (s): 0.43 | learning rate: 1.653E-04 | global batch size: 256 | lm loss: 2.383970E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.754 | TFLOPs: 31.05 | 7: iteration 34190/ 115203 | consumed samples: 8752640 | consumed tokens: 17925406720 | elapsed time per iteration (s): 0.59 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.362817E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 436.313 | TFLOPs: 22.89 | 7: iteration 34200/ 115203 | consumed samples: 8755200 | consumed tokens: 17930649600 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.321576E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.096 | TFLOPs: 32.01 | 7: iteration 34210/ 115203 | consumed samples: 8757760 | consumed tokens: 17935892480 | elapsed time per iteration (s): 0.43 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.339892E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.900 | TFLOPs: 31.27 | 7: iteration 34220/ 115203 | consumed samples: 8760320 | consumed tokens: 17941135360 | elapsed time per iteration (s): 0.43 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.353006E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.838 | TFLOPs: 31.26 | 7: iteration 34230/ 115203 | consumed samples: 8762880 | consumed tokens: 17946378240 | elapsed time per iteration (s): 0.42 | learning rate: 1.652E-04 | global batch size: 256 | lm loss: 2.320861E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.382 | TFLOPs: 31.61 | 7: iteration 34240/ 115203 | consumed samples: 8765440 | consumed tokens: 17951621120 | elapsed time per iteration (s): 0.43 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.331212E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.365 | TFLOPs: 31.50 | 7: iteration 34250/ 115203 | consumed samples: 8768000 | consumed tokens: 17956864000 | elapsed time per iteration (s): 0.43 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.345512E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.138 | TFLOPs: 31.44 | 7: iteration 34260/ 115203 | consumed samples: 8770560 | consumed tokens: 17962106880 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.347316E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.417 | TFLOPs: 31.98 | 7: iteration 34270/ 115203 | consumed samples: 8773120 | consumed tokens: 17967349760 | elapsed time per iteration (s): 0.43 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.367599E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.219 | TFLOPs: 31.54 | 7: iteration 34280/ 115203 | consumed samples: 8775680 | consumed tokens: 17972592640 | elapsed time per iteration (s): 0.42 | learning rate: 1.651E-04 | global batch size: 256 | lm loss: 2.362254E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.227 | TFLOPs: 31.86 | 7: iteration 34290/ 115203 | consumed samples: 8778240 | consumed tokens: 17977835520 | elapsed time per iteration (s): 0.43 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.374978E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.844 | TFLOPs: 31.42 | 7: iteration 34300/ 115203 | consumed samples: 8780800 | consumed tokens: 17983078400 | elapsed time per iteration (s): 0.43 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.371832E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.980 | TFLOPs: 31.38 | 7: iteration 34310/ 115203 | consumed samples: 8783360 | consumed tokens: 17988321280 | elapsed time per iteration (s): 0.43 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.349050E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.295 | TFLOPs: 31.50 | 7: iteration 34320/ 115203 | consumed samples: 8785920 | consumed tokens: 17993564160 | elapsed time per iteration (s): 0.43 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.394147E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.702 | TFLOPs: 30.99 | 7: iteration 34330/ 115203 | consumed samples: 8788480 | consumed tokens: 17998807040 | elapsed time per iteration (s): 0.42 | learning rate: 1.650E-04 | global batch size: 256 | lm loss: 2.371845E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.545 | TFLOPs: 31.98 | 7: iteration 34340/ 115203 | consumed samples: 8791040 | consumed tokens: 18004049920 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.325546E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.004 | TFLOPs: 31.74 | 7: iteration 34350/ 115203 | consumed samples: 8793600 | consumed tokens: 18009292800 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.327916E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.714 | TFLOPs: 31.99 | 7: iteration 34360/ 115203 | consumed samples: 8796160 | consumed tokens: 18014535680 | elapsed time per iteration (s): 0.42 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.318785E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.879 | TFLOPs: 31.79 | 7: iteration 34370/ 115203 | consumed samples: 8798720 | consumed tokens: 18019778560 | elapsed time per iteration (s): 0.43 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.367535E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.888 | TFLOPs: 31.16 | 7: iteration 34380/ 115203 | consumed samples: 8801280 | consumed tokens: 18025021440 | elapsed time per iteration (s): 0.43 | learning rate: 1.649E-04 | global batch size: 256 | lm loss: 2.377738E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.282 | TFLOPs: 31.60 | 7: iteration 34390/ 115203 | consumed samples: 8803840 | consumed tokens: 18030264320 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.325110E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.631 | TFLOPs: 31.62 | 7: iteration 34400/ 115203 | consumed samples: 8806400 | consumed tokens: 18035507200 | elapsed time per iteration (s): 0.43 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.375448E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.652 | TFLOPs: 31.57 | 7: iteration 34410/ 115203 | consumed samples: 8808960 | consumed tokens: 18040750080 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.369014E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.456 | TFLOPs: 32.03 | 7: iteration 34420/ 115203 | consumed samples: 8811520 | consumed tokens: 18045992960 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.358331E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.596 | TFLOPs: 32.09 | 7: iteration 34430/ 115203 | consumed samples: 8814080 | consumed tokens: 18051235840 | elapsed time per iteration (s): 0.42 | learning rate: 1.648E-04 | global batch size: 256 | lm loss: 2.371647E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.932 | TFLOPs: 32.11 | 7: iteration 34440/ 115203 | consumed samples: 8816640 | consumed tokens: 18056478720 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.358197E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.485 | TFLOPs: 31.19 | 7: iteration 34450/ 115203 | consumed samples: 8819200 | consumed tokens: 18061721600 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.359958E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.876 | TFLOPs: 31.05 | 7: iteration 34460/ 115203 | consumed samples: 8821760 | consumed tokens: 18066964480 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.323270E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.757 | TFLOPs: 31.73 | 7: iteration 34470/ 115203 | consumed samples: 8824320 | consumed tokens: 18072207360 | elapsed time per iteration (s): 0.43 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.323156E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.072 | TFLOPs: 31.59 | 7: iteration 34480/ 115203 | consumed samples: 8826880 | consumed tokens: 18077450240 | elapsed time per iteration (s): 0.42 | learning rate: 1.647E-04 | global batch size: 256 | lm loss: 2.322763E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.658 | TFLOPs: 31.99 | 7: iteration 34490/ 115203 | consumed samples: 8829440 | consumed tokens: 18082693120 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.347053E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.015 | TFLOPs: 31.85 | 7: iteration 34500/ 115203 | consumed samples: 8832000 | consumed tokens: 18087936000 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.306795E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.288 | TFLOPs: 31.71 | 7: iteration 34510/ 115203 | consumed samples: 8834560 | consumed tokens: 18093178880 | elapsed time per iteration (s): 0.42 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.321377E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.354 | TFLOPs: 31.66 | 7: iteration 34520/ 115203 | consumed samples: 8837120 | consumed tokens: 18098421760 | elapsed time per iteration (s): 0.43 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.335578E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.119 | TFLOPs: 31.59 | 7: iteration 34530/ 115203 | consumed samples: 8839680 | consumed tokens: 18103664640 | elapsed time per iteration (s): 0.43 | learning rate: 1.646E-04 | global batch size: 256 | lm loss: 2.355920E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.051 | TFLOPs: 30.96 | 7: iteration 34540/ 115203 | consumed samples: 8842240 | consumed tokens: 18108907520 | elapsed time per iteration (s): 0.43 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.328940E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.808 | TFLOPs: 30.95 | 7: iteration 34550/ 115203 | consumed samples: 8844800 | consumed tokens: 18114150400 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.366196E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.781 | TFLOPs: 31.68 | 7: iteration 34560/ 115203 | consumed samples: 8847360 | consumed tokens: 18119393280 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.320877E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.169 | TFLOPs: 32.12 | 7: iteration 34570/ 115203 | consumed samples: 8849920 | consumed tokens: 18124636160 | elapsed time per iteration (s): 0.43 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.347020E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.810 | TFLOPs: 31.52 | 7: iteration 34580/ 115203 | consumed samples: 8852480 | consumed tokens: 18129879040 | elapsed time per iteration (s): 0.42 | learning rate: 1.645E-04 | global batch size: 256 | lm loss: 2.368750E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.414 | TFLOPs: 32.18 | 7: iteration 34590/ 115203 | consumed samples: 8855040 | consumed tokens: 18135121920 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.315915E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.055 | TFLOPs: 31.69 | 7: iteration 34600/ 115203 | consumed samples: 8857600 | consumed tokens: 18140364800 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.340262E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.884 | TFLOPs: 31.68 | 7: iteration 34610/ 115203 | consumed samples: 8860160 | consumed tokens: 18145607680 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.291656E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.459 | TFLOPs: 31.61 | 7: iteration 34620/ 115203 | consumed samples: 8862720 | consumed tokens: 18150850560 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.330053E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.911 | TFLOPs: 31.79 | 7: iteration 34630/ 115203 | consumed samples: 8865280 | consumed tokens: 18156093440 | elapsed time per iteration (s): 0.42 | learning rate: 1.644E-04 | global batch size: 256 | lm loss: 2.338788E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.539 | TFLOPs: 32.09 | 7: iteration 34640/ 115203 | consumed samples: 8867840 | consumed tokens: 18161336320 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.354515E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.276 | TFLOPs: 31.55 | 7: iteration 34650/ 115203 | consumed samples: 8870400 | consumed tokens: 18166579200 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.354005E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.289 | TFLOPs: 32.07 | 7: iteration 34660/ 115203 | consumed samples: 8872960 | consumed tokens: 18171822080 | elapsed time per iteration (s): 0.42 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.342153E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.376 | TFLOPs: 31.71 | 7: iteration 34670/ 115203 | consumed samples: 8875520 | consumed tokens: 18177064960 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.354267E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.920 | TFLOPs: 31.58 | 7: iteration 34680/ 115203 | consumed samples: 8878080 | consumed tokens: 18182307840 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.350026E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.693 | TFLOPs: 31.20 | 7: iteration 34690/ 115203 | consumed samples: 8880640 | consumed tokens: 18187550720 | elapsed time per iteration (s): 0.43 | learning rate: 1.643E-04 | global batch size: 256 | lm loss: 2.354987E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.576 | TFLOPs: 31.46 | 7: iteration 34700/ 115203 | consumed samples: 8883200 | consumed tokens: 18192793600 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.336233E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.085 | TFLOPs: 31.96 | 7: iteration 34710/ 115203 | consumed samples: 8885760 | consumed tokens: 18198036480 | elapsed time per iteration (s): 0.43 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.346495E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.791 | TFLOPs: 31.42 | 7: iteration 34720/ 115203 | consumed samples: 8888320 | consumed tokens: 18203279360 | elapsed time per iteration (s): 0.43 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.356875E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.037 | TFLOPs: 31.54 | 7: iteration 34730/ 115203 | consumed samples: 8890880 | consumed tokens: 18208522240 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.362925E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.212 | TFLOPs: 31.65 | 7: iteration 34740/ 115203 | consumed samples: 8893440 | consumed tokens: 18213765120 | elapsed time per iteration (s): 0.42 | learning rate: 1.642E-04 | global batch size: 256 | lm loss: 2.355591E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.722 | TFLOPs: 32.20 | 7: iteration 34750/ 115203 | consumed samples: 8896000 | consumed tokens: 18219008000 | elapsed time per iteration (s): 0.43 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.350374E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.630 | TFLOPs: 31.46 | 7: iteration 34760/ 115203 | consumed samples: 8898560 | consumed tokens: 18224250880 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.339744E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.406 | TFLOPs: 31.76 | 7: iteration 34770/ 115203 | consumed samples: 8901120 | consumed tokens: 18229493760 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.330195E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.700 | TFLOPs: 31.89 | 7: iteration 34780/ 115203 | consumed samples: 8903680 | consumed tokens: 18234736640 | elapsed time per iteration (s): 0.42 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.312605E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.163 | TFLOPs: 31.86 | 7: iteration 34790/ 115203 | consumed samples: 8906240 | consumed tokens: 18239979520 | elapsed time per iteration (s): 0.43 | learning rate: 1.641E-04 | global batch size: 256 | lm loss: 2.334866E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.226 | TFLOPs: 31.34 | 7: iteration 34800/ 115203 | consumed samples: 8908800 | consumed tokens: 18245222400 | elapsed time per iteration (s): 0.43 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.340306E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.387 | TFLOPs: 31.03 | 7: iteration 34810/ 115203 | consumed samples: 8911360 | consumed tokens: 18250465280 | elapsed time per iteration (s): 0.43 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.352236E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.743 | TFLOPs: 31.57 | 7: iteration 34820/ 115203 | consumed samples: 8913920 | consumed tokens: 18255708160 | elapsed time per iteration (s): 0.43 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.317342E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.284 | TFLOPs: 31.55 | 7: iteration 34830/ 115203 | consumed samples: 8916480 | consumed tokens: 18260951040 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.325230E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.925 | TFLOPs: 32.11 | 7: iteration 34840/ 115203 | consumed samples: 8919040 | consumed tokens: 18266193920 | elapsed time per iteration (s): 0.42 | learning rate: 1.640E-04 | global batch size: 256 | lm loss: 2.348400E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.767 | TFLOPs: 32.26 | 7: iteration 34850/ 115203 | consumed samples: 8921600 | consumed tokens: 18271436800 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.362696E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.515 | TFLOPs: 31.88 | 7: iteration 34860/ 115203 | consumed samples: 8924160 | consumed tokens: 18276679680 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.343901E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.013 | TFLOPs: 32.11 | 7: iteration 34870/ 115203 | consumed samples: 8926720 | consumed tokens: 18281922560 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.391247E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.227 | TFLOPs: 32.23 | 7: iteration 34880/ 115203 | consumed samples: 8929280 | consumed tokens: 18287165440 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.321908E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.260 | TFLOPs: 31.81 | 7: iteration 34890/ 115203 | consumed samples: 8931840 | consumed tokens: 18292408320 | elapsed time per iteration (s): 0.42 | learning rate: 1.639E-04 | global batch size: 256 | lm loss: 2.370310E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.705 | TFLOPs: 32.10 | 7: iteration 34900/ 115203 | consumed samples: 8934400 | consumed tokens: 18297651200 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.364544E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.168 | TFLOPs: 31.96 | 7: iteration 34910/ 115203 | consumed samples: 8936960 | consumed tokens: 18302894080 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.359008E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.318 | TFLOPs: 31.81 | 7: iteration 34920/ 115203 | consumed samples: 8939520 | consumed tokens: 18308136960 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.331599E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.791 | TFLOPs: 31.78 | 7: iteration 34930/ 115203 | consumed samples: 8942080 | consumed tokens: 18313379840 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.356234E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.897 | TFLOPs: 31.69 | 7: iteration 34940/ 115203 | consumed samples: 8944640 | consumed tokens: 18318622720 | elapsed time per iteration (s): 0.42 | learning rate: 1.638E-04 | global batch size: 256 | lm loss: 2.339536E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.797 | TFLOPs: 32.00 | 7: iteration 34950/ 115203 | consumed samples: 8947200 | consumed tokens: 18323865600 | elapsed time per iteration (s): 0.43 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.344348E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.395 | TFLOPs: 31.55 | 7: iteration 34960/ 115203 | consumed samples: 8949760 | consumed tokens: 18329108480 | elapsed time per iteration (s): 0.43 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.339621E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.392 | TFLOPs: 31.13 | 7: iteration 34970/ 115203 | consumed samples: 8952320 | consumed tokens: 18334351360 | elapsed time per iteration (s): 0.43 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.369297E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.545 | TFLOPs: 31.56 | 7: iteration 34980/ 115203 | consumed samples: 8954880 | consumed tokens: 18339594240 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.363187E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.993 | TFLOPs: 32.11 | 7: iteration 34990/ 115203 | consumed samples: 8957440 | consumed tokens: 18344837120 | elapsed time per iteration (s): 0.42 | learning rate: 1.637E-04 | global batch size: 256 | lm loss: 2.381845E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.147 | TFLOPs: 32.12 | 7: iteration 35000/ 115203 | consumed samples: 8960000 | consumed tokens: 18350080000 | elapsed time per iteration (s): 0.42 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.373376E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.605 | TFLOPs: 31.78 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 35000 | lm loss value: 2.347984E+00 | lm loss PPL: 1.046445E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 35000 to checkpoints_221m 0: [2022-11-28 17:09:23,524] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step35000 is begin to save! 0: [2022-11-28 17:09:23,573] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:09:23,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:09:23,841] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:09:23,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:09:23,864] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:09:23,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:09:23,888] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:09:23,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:09:23,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:09:23,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:09:23,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:09:23,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:09:23,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:09:23,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:09:23,985] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:09:24,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:09:24,010] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:09:24,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:09:24,035] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:09:24,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:09:24,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:09:24,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:09:24,084] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:09:24,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:09:24,109] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:09:24,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:09:24,134] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:09:24,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:09:24,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:09:24,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:09:24,184] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:09:24,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:09:24,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:09:24,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:09:24,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:09:24,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:09:24,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:09:24,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:09:24,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:09:24,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:09:24,289] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step35000/mp_rank_00_model_states.pt 0: [2022-11-28 17:09:24,289] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:09:24,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:09:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:09:24,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 3: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:09:24,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:09:24,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 5: [2022-11-28 17:09:24,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:09:24,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:09:24,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 2: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:09:24,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 6: [2022-11-28 17:09:24,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 4: [2022-11-28 17:09:24,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:09:24,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:09:24,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 1: [2022-11-28 17:09:24,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:09:24,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:09:24,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 7: [2022-11-28 17:09:24,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:09:24,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:09:24,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:09:24,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: [2022-11-28 17:09:24,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step35000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:09:24,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step35000 is ready now! 0: successfully saved checkpoint at iteration 35000 to checkpoints_221m 7: time (ms) | save-checkpoint: 930.54 7: iteration 35010/ 115203 | consumed samples: 8962560 | consumed tokens: 18355322880 | elapsed time per iteration (s): 0.54 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.360138E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 476.041 | TFLOPs: 24.98 | 7: iteration 35020/ 115203 | consumed samples: 8965120 | consumed tokens: 18360565760 | elapsed time per iteration (s): 0.43 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.368003E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.224 | TFLOPs: 31.44 | 7: iteration 35030/ 115203 | consumed samples: 8967680 | consumed tokens: 18365808640 | elapsed time per iteration (s): 0.43 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.360432E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.884 | TFLOPs: 31.21 | 7: iteration 35040/ 115203 | consumed samples: 8970240 | consumed tokens: 18371051520 | elapsed time per iteration (s): 0.43 | learning rate: 1.636E-04 | global batch size: 256 | lm loss: 2.342756E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.238 | TFLOPs: 31.23 | 7: iteration 35050/ 115203 | consumed samples: 8972800 | consumed tokens: 18376294400 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.370456E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.828 | TFLOPs: 31.73 | 7: iteration 35060/ 115203 | consumed samples: 8975360 | consumed tokens: 18381537280 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.336002E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.882 | TFLOPs: 31.74 | 7: iteration 35070/ 115203 | consumed samples: 8977920 | consumed tokens: 18386780160 | elapsed time per iteration (s): 0.43 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.359379E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.268 | TFLOPs: 31.44 | 7: iteration 35080/ 115203 | consumed samples: 8980480 | consumed tokens: 18392023040 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.315364E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.793 | TFLOPs: 32.20 | 7: iteration 35090/ 115203 | consumed samples: 8983040 | consumed tokens: 18397265920 | elapsed time per iteration (s): 0.42 | learning rate: 1.635E-04 | global batch size: 256 | lm loss: 2.338519E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.210 | TFLOPs: 31.91 | 7: iteration 35100/ 115203 | consumed samples: 8985600 | consumed tokens: 18402508800 | elapsed time per iteration (s): 0.43 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.318243E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.745 | TFLOPs: 31.47 | 7: iteration 35110/ 115203 | consumed samples: 8988160 | consumed tokens: 18407751680 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.325935E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.227 | TFLOPs: 31.86 | 7: iteration 35120/ 115203 | consumed samples: 8990720 | consumed tokens: 18412994560 | elapsed time per iteration (s): 0.43 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.339534E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.338 | TFLOPs: 31.55 | 7: iteration 35130/ 115203 | consumed samples: 8993280 | consumed tokens: 18418237440 | elapsed time per iteration (s): 0.43 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.354243E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.368 | TFLOPs: 31.08 | 7: iteration 35140/ 115203 | consumed samples: 8995840 | consumed tokens: 18423480320 | elapsed time per iteration (s): 0.42 | learning rate: 1.634E-04 | global batch size: 256 | lm loss: 2.353922E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.430 | TFLOPs: 31.98 | 7: iteration 35150/ 115203 | consumed samples: 8998400 | consumed tokens: 18428723200 | elapsed time per iteration (s): 0.43 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.350478E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.434 | TFLOPs: 31.35 | 7: iteration 35160/ 115203 | consumed samples: 9000960 | consumed tokens: 18433966080 | elapsed time per iteration (s): 0.43 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.335568E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.099 | TFLOPs: 31.59 | 7: iteration 35170/ 115203 | consumed samples: 9003520 | consumed tokens: 18439208960 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.324225E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.327 | TFLOPs: 32.08 | 7: iteration 35180/ 115203 | consumed samples: 9006080 | consumed tokens: 18444451840 | elapsed time per iteration (s): 0.43 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.339577E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.083 | TFLOPs: 31.59 | 7: iteration 35190/ 115203 | consumed samples: 9008640 | consumed tokens: 18449694720 | elapsed time per iteration (s): 0.42 | learning rate: 1.633E-04 | global batch size: 256 | lm loss: 2.359432E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.352 | TFLOPs: 31.76 | 7: iteration 35200/ 115203 | consumed samples: 9011200 | consumed tokens: 18454937600 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.360800E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.295 | TFLOPs: 32.13 | 7: iteration 35210/ 115203 | consumed samples: 9013760 | consumed tokens: 18460180480 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.345544E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.502 | TFLOPs: 31.98 | 7: iteration 35220/ 115203 | consumed samples: 9016320 | consumed tokens: 18465423360 | elapsed time per iteration (s): 0.42 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.349995E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.451 | TFLOPs: 31.82 | 7: iteration 35230/ 115203 | consumed samples: 9018880 | consumed tokens: 18470666240 | elapsed time per iteration (s): 0.43 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.347695E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.777 | TFLOPs: 31.57 | 7: iteration 35240/ 115203 | consumed samples: 9021440 | consumed tokens: 18475909120 | elapsed time per iteration (s): 0.44 | learning rate: 1.632E-04 | global batch size: 256 | lm loss: 2.334205E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.654 | TFLOPs: 30.47 | 7: iteration 35250/ 115203 | consumed samples: 9024000 | consumed tokens: 18481152000 | elapsed time per iteration (s): 0.43 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.345604E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.393 | TFLOPs: 31.24 | 7: iteration 35260/ 115203 | consumed samples: 9026560 | consumed tokens: 18486394880 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.325328E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.820 | TFLOPs: 31.73 | 7: iteration 35270/ 115203 | consumed samples: 9029120 | consumed tokens: 18491637760 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.334794E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.633 | TFLOPs: 31.93 | 7: iteration 35280/ 115203 | consumed samples: 9031680 | consumed tokens: 18496880640 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.362551E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.854 | TFLOPs: 32.05 | 7: iteration 35290/ 115203 | consumed samples: 9034240 | consumed tokens: 18502123520 | elapsed time per iteration (s): 0.42 | learning rate: 1.631E-04 | global batch size: 256 | lm loss: 2.319375E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.932 | TFLOPs: 31.90 | 7: iteration 35300/ 115203 | consumed samples: 9036800 | consumed tokens: 18507366400 | elapsed time per iteration (s): 0.43 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.358554E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.642 | TFLOPs: 31.25 | 7: iteration 35310/ 115203 | consumed samples: 9039360 | consumed tokens: 18512609280 | elapsed time per iteration (s): 0.43 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.333949E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.956 | TFLOPs: 31.53 | 7: iteration 35320/ 115203 | consumed samples: 9041920 | consumed tokens: 18517852160 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.368186E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.751 | TFLOPs: 31.84 | 7: iteration 35330/ 115203 | consumed samples: 9044480 | consumed tokens: 18523095040 | elapsed time per iteration (s): 0.43 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.315723E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.676 | TFLOPs: 31.31 | 7: iteration 35340/ 115203 | consumed samples: 9047040 | consumed tokens: 18528337920 | elapsed time per iteration (s): 0.42 | learning rate: 1.630E-04 | global batch size: 256 | lm loss: 2.360401E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.293 | TFLOPs: 31.86 | 7: iteration 35350/ 115203 | consumed samples: 9049600 | consumed tokens: 18533580800 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.341109E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.939 | TFLOPs: 31.74 | 7: iteration 35360/ 115203 | consumed samples: 9052160 | consumed tokens: 18538823680 | elapsed time per iteration (s): 0.43 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.328954E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.633 | TFLOPs: 31.04 | 7: iteration 35370/ 115203 | consumed samples: 9054720 | consumed tokens: 18544066560 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.331577E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.989 | TFLOPs: 31.64 | 7: iteration 35380/ 115203 | consumed samples: 9057280 | consumed tokens: 18549309440 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.351299E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.642 | TFLOPs: 31.93 | 7: iteration 35390/ 115203 | consumed samples: 9059840 | consumed tokens: 18554552320 | elapsed time per iteration (s): 0.42 | learning rate: 1.629E-04 | global batch size: 256 | lm loss: 2.365092E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.700 | TFLOPs: 31.94 | 7: iteration 35400/ 115203 | consumed samples: 9062400 | consumed tokens: 18559795200 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.366031E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.331 | TFLOPs: 31.71 | 7: iteration 35410/ 115203 | consumed samples: 9064960 | consumed tokens: 18565038080 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.323930E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.250 | TFLOPs: 31.81 | 7: iteration 35420/ 115203 | consumed samples: 9067520 | consumed tokens: 18570280960 | elapsed time per iteration (s): 0.43 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.355256E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.784 | TFLOPs: 31.57 | 7: iteration 35430/ 115203 | consumed samples: 9070080 | consumed tokens: 18575523840 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.337491E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.794 | TFLOPs: 31.94 | 7: iteration 35440/ 115203 | consumed samples: 9072640 | consumed tokens: 18580766720 | elapsed time per iteration (s): 0.42 | learning rate: 1.628E-04 | global batch size: 256 | lm loss: 2.310965E+00 | grad norm: 0.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.221 | TFLOPs: 31.86 | 7: iteration 35450/ 115203 | consumed samples: 9075200 | consumed tokens: 18586009600 | elapsed time per iteration (s): 0.43 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.312786E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.251 | TFLOPs: 31.39 | 7: iteration 35460/ 115203 | consumed samples: 9077760 | consumed tokens: 18591252480 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.324982E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.906 | TFLOPs: 31.95 | 7: iteration 35470/ 115203 | consumed samples: 9080320 | consumed tokens: 18596495360 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.346265E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.409 | TFLOPs: 31.87 | 7: iteration 35480/ 115203 | consumed samples: 9082880 | consumed tokens: 18601738240 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.359967E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.204 | TFLOPs: 31.96 | 7: iteration 35490/ 115203 | consumed samples: 9085440 | consumed tokens: 18606981120 | elapsed time per iteration (s): 0.42 | learning rate: 1.627E-04 | global batch size: 256 | lm loss: 2.372496E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.317 | TFLOPs: 32.02 | 7: iteration 35500/ 115203 | consumed samples: 9088000 | consumed tokens: 18612224000 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.342778E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.339 | TFLOPs: 31.71 | 7: iteration 35510/ 115203 | consumed samples: 9090560 | consumed tokens: 18617466880 | elapsed time per iteration (s): 0.42 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.343806E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.366 | TFLOPs: 32.08 | 7: iteration 35520/ 115203 | consumed samples: 9093120 | consumed tokens: 18622709760 | elapsed time per iteration (s): 0.43 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.343421E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.743 | TFLOPs: 31.42 | 7: iteration 35530/ 115203 | consumed samples: 9095680 | consumed tokens: 18627952640 | elapsed time per iteration (s): 0.43 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.339243E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.863 | TFLOPs: 31.21 | 7: iteration 35540/ 115203 | consumed samples: 9098240 | consumed tokens: 18633195520 | elapsed time per iteration (s): 0.43 | learning rate: 1.626E-04 | global batch size: 256 | lm loss: 2.329144E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.173 | TFLOPs: 31.23 | 7: iteration 35550/ 115203 | consumed samples: 9100800 | consumed tokens: 18638438400 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.361805E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.642 | TFLOPs: 31.62 | 7: iteration 35560/ 115203 | consumed samples: 9103360 | consumed tokens: 18643681280 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.354555E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.922 | TFLOPs: 31.90 | 7: iteration 35570/ 115203 | consumed samples: 9105920 | consumed tokens: 18648924160 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.371658E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.288 | TFLOPs: 32.02 | 7: iteration 35580/ 115203 | consumed samples: 9108480 | consumed tokens: 18654167040 | elapsed time per iteration (s): 0.42 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.332018E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.668 | TFLOPs: 32.04 | 7: iteration 35590/ 115203 | consumed samples: 9111040 | consumed tokens: 18659409920 | elapsed time per iteration (s): 0.43 | learning rate: 1.625E-04 | global batch size: 256 | lm loss: 2.293730E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.049 | TFLOPs: 30.91 | 7: iteration 35600/ 115203 | consumed samples: 9113600 | consumed tokens: 18664652800 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.355034E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.493 | TFLOPs: 32.08 | 7: iteration 35610/ 115203 | consumed samples: 9116160 | consumed tokens: 18669895680 | elapsed time per iteration (s): 0.44 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.349485E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.636 | TFLOPs: 30.62 | 7: iteration 35620/ 115203 | consumed samples: 9118720 | consumed tokens: 18675138560 | elapsed time per iteration (s): 0.42 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.333152E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.272 | TFLOPs: 31.86 | 7: iteration 35630/ 115203 | consumed samples: 9121280 | consumed tokens: 18680381440 | elapsed time per iteration (s): 0.71 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.349818E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 360.087 | TFLOPs: 18.89 | 7: iteration 35640/ 115203 | consumed samples: 9123840 | consumed tokens: 18685624320 | elapsed time per iteration (s): 0.43 | learning rate: 1.624E-04 | global batch size: 256 | lm loss: 2.337253E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.314 | TFLOPs: 31.50 | 7: iteration 35650/ 115203 | consumed samples: 9126400 | consumed tokens: 18690867200 | elapsed time per iteration (s): 1.05 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.331793E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 243.255 | TFLOPs: 12.76 | 7: iteration 35660/ 115203 | consumed samples: 9128960 | consumed tokens: 18696110080 | elapsed time per iteration (s): 0.44 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.359922E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.365 | TFLOPs: 30.35 | 7: iteration 35670/ 115203 | consumed samples: 9131520 | consumed tokens: 18701352960 | elapsed time per iteration (s): 0.43 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.343217E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.001 | TFLOPs: 31.48 | 7: iteration 35680/ 115203 | consumed samples: 9134080 | consumed tokens: 18706595840 | elapsed time per iteration (s): 0.43 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.355911E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.744 | TFLOPs: 31.31 | 7: iteration 35690/ 115203 | consumed samples: 9136640 | consumed tokens: 18711838720 | elapsed time per iteration (s): 0.43 | learning rate: 1.623E-04 | global batch size: 256 | lm loss: 2.311353E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.608 | TFLOPs: 31.20 | 7: iteration 35700/ 115203 | consumed samples: 9139200 | consumed tokens: 18717081600 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.329104E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.125 | TFLOPs: 31.75 | 7: iteration 35710/ 115203 | consumed samples: 9141760 | consumed tokens: 18722324480 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.358882E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.100 | TFLOPs: 30.91 | 7: iteration 35720/ 115203 | consumed samples: 9144320 | consumed tokens: 18727567360 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.306955E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.905 | TFLOPs: 31.42 | 7: iteration 35730/ 115203 | consumed samples: 9146880 | consumed tokens: 18732810240 | elapsed time per iteration (s): 0.42 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.367617E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.482 | TFLOPs: 31.72 | 7: iteration 35740/ 115203 | consumed samples: 9149440 | consumed tokens: 18738053120 | elapsed time per iteration (s): 0.43 | learning rate: 1.622E-04 | global batch size: 256 | lm loss: 2.353503E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.123 | TFLOPs: 31.54 | 7: iteration 35750/ 115203 | consumed samples: 9152000 | consumed tokens: 18743296000 | elapsed time per iteration (s): 0.43 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.351888E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.037 | TFLOPs: 31.59 | 7: iteration 35760/ 115203 | consumed samples: 9154560 | consumed tokens: 18748538880 | elapsed time per iteration (s): 0.43 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.330582E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.200 | TFLOPs: 31.02 | 7: iteration 35770/ 115203 | consumed samples: 9157120 | consumed tokens: 18753781760 | elapsed time per iteration (s): 0.42 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.327245E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.453 | TFLOPs: 31.71 | 7: iteration 35780/ 115203 | consumed samples: 9159680 | consumed tokens: 18759024640 | elapsed time per iteration (s): 0.44 | learning rate: 1.621E-04 | global batch size: 256 | lm loss: 2.333205E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.479 | TFLOPs: 30.67 | 7: iteration 35790/ 115203 | consumed samples: 9162240 | consumed tokens: 18764267520 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.323908E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.095 | TFLOPs: 31.43 | 7: iteration 35800/ 115203 | consumed samples: 9164800 | consumed tokens: 18769510400 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.344860E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.337 | TFLOPs: 31.60 | 7: iteration 35810/ 115203 | consumed samples: 9167360 | consumed tokens: 18774753280 | elapsed time per iteration (s): 0.44 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.349500E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.956 | TFLOPs: 30.69 | 7: iteration 35820/ 115203 | consumed samples: 9169920 | consumed tokens: 18779996160 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.332641E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.949 | TFLOPs: 31.53 | 7: iteration 35830/ 115203 | consumed samples: 9172480 | consumed tokens: 18785239040 | elapsed time per iteration (s): 0.43 | learning rate: 1.620E-04 | global batch size: 256 | lm loss: 2.311040E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.889 | TFLOPs: 31.00 | 7: iteration 35840/ 115203 | consumed samples: 9175040 | consumed tokens: 18790481920 | elapsed time per iteration (s): 0.43 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.315967E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.885 | TFLOPs: 31.00 | 7: iteration 35850/ 115203 | consumed samples: 9177600 | consumed tokens: 18795724800 | elapsed time per iteration (s): 0.43 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.357744E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.536 | TFLOPs: 30.98 | 7: iteration 35860/ 115203 | consumed samples: 9180160 | consumed tokens: 18800967680 | elapsed time per iteration (s): 0.44 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.337989E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.467 | TFLOPs: 30.72 | 7: iteration 35870/ 115203 | consumed samples: 9182720 | consumed tokens: 18806210560 | elapsed time per iteration (s): 0.43 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.377585E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.439 | TFLOPs: 31.19 | 7: iteration 35880/ 115203 | consumed samples: 9185280 | consumed tokens: 18811453440 | elapsed time per iteration (s): 0.42 | learning rate: 1.619E-04 | global batch size: 256 | lm loss: 2.334072E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.064 | TFLOPs: 31.80 | 7: iteration 35890/ 115203 | consumed samples: 9187840 | consumed tokens: 18816696320 | elapsed time per iteration (s): 0.43 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.335948E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.581 | TFLOPs: 30.99 | 7: iteration 35900/ 115203 | consumed samples: 9190400 | consumed tokens: 18821939200 | elapsed time per iteration (s): 0.42 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.344908E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.128 | TFLOPs: 32.01 | 7: iteration 35910/ 115203 | consumed samples: 9192960 | consumed tokens: 18827182080 | elapsed time per iteration (s): 0.44 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.332264E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.852 | TFLOPs: 30.63 | 7: iteration 35920/ 115203 | consumed samples: 9195520 | consumed tokens: 18832424960 | elapsed time per iteration (s): 0.46 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.355947E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.928 | TFLOPs: 29.43 | 7: iteration 35930/ 115203 | consumed samples: 9198080 | consumed tokens: 18837667840 | elapsed time per iteration (s): 0.43 | learning rate: 1.618E-04 | global batch size: 256 | lm loss: 2.320573E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.505 | TFLOPs: 31.14 | 7: iteration 35940/ 115203 | consumed samples: 9200640 | consumed tokens: 18842910720 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.330370E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.647 | TFLOPs: 30.89 | 7: iteration 35950/ 115203 | consumed samples: 9203200 | consumed tokens: 18848153600 | elapsed time per iteration (s): 0.44 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.356862E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.122 | TFLOPs: 30.86 | 7: iteration 35960/ 115203 | consumed samples: 9205760 | consumed tokens: 18853396480 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.365739E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.448 | TFLOPs: 30.93 | 7: iteration 35970/ 115203 | consumed samples: 9208320 | consumed tokens: 18858639360 | elapsed time per iteration (s): 0.43 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.347638E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.621 | TFLOPs: 30.94 | 7: iteration 35980/ 115203 | consumed samples: 9210880 | consumed tokens: 18863882240 | elapsed time per iteration (s): 0.44 | learning rate: 1.617E-04 | global batch size: 256 | lm loss: 2.323155E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.810 | TFLOPs: 30.68 | 7: iteration 35990/ 115203 | consumed samples: 9213440 | consumed tokens: 18869125120 | elapsed time per iteration (s): 0.43 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.327853E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.960 | TFLOPs: 31.32 | 0: [2022-11-28 17:16:40,585] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=0, lr=[0.00016162432908965068, 0.00016162432908965068, 0.00016162432908965068], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 36000/ 115203 | consumed samples: 9216000 | consumed tokens: 18874368000 | elapsed time per iteration (s): 0.43 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.358133E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.209 | TFLOPs: 30.97 | 0: steps: 36000 loss: 2.3901 iter time (s): 0.429 samples/sec: 597.091 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 36000 | lm loss value: 2.285959E+00 | lm loss PPL: 9.835118E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 36000 to checkpoints_221m 0: [2022-11-28 17:16:40,746] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step36000 is begin to save! 0: [2022-11-28 17:16:40,751] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:16:40,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:16:40,866] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:16:40,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:16:40,888] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:16:40,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:16:40,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:16:40,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:16:40,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:16:40,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:16:40,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:16:40,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:16:40,988] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:16:41,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:16:41,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:16:41,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:16:41,038] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:16:41,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:16:41,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:16:41,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:16:41,089] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:16:41,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:16:41,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:16:41,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:16:41,142] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:16:41,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:16:41,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:16:41,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:16:41,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:16:41,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:16:41,215] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:16:41,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:16:41,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:16:41,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:16:41,266] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:16:41,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:16:41,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:16:41,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:16:41,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:16:41,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:16:41,321] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step36000/mp_rank_00_model_states.pt 0: [2022-11-28 17:16:41,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:16:41,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:16:41,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 2: [2022-11-28 17:16:41,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 6: [2022-11-28 17:16:41,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:16:41,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:16:41,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:16:41,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 5: [2022-11-28 17:16:41,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 3: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:16:41,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 3: [2022-11-28 17:16:41,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:16:41,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 1: [2022-11-28 17:16:41,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: [2022-11-28 17:16:41,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:16:41,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 4: [2022-11-28 17:16:41,450] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step36000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 7: [2022-11-28 17:16:41,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step36000 is ready now! 0: successfully saved checkpoint at iteration 36000 to checkpoints_221m 7: time (ms) | save-checkpoint: 733.16 7: iteration 36010/ 115203 | consumed samples: 9218560 | consumed tokens: 18879610880 | elapsed time per iteration (s): 0.51 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.329959E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 498.248 | TFLOPs: 26.14 | 7: iteration 36020/ 115203 | consumed samples: 9221120 | consumed tokens: 18884853760 | elapsed time per iteration (s): 0.44 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.337705E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.422 | TFLOPs: 30.87 | 7: iteration 36030/ 115203 | consumed samples: 9223680 | consumed tokens: 18890096640 | elapsed time per iteration (s): 0.43 | learning rate: 1.616E-04 | global batch size: 256 | lm loss: 2.321541E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.474 | TFLOPs: 31.19 | 7: iteration 36040/ 115203 | consumed samples: 9226240 | consumed tokens: 18895339520 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.333686E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.658 | TFLOPs: 31.57 | 7: iteration 36050/ 115203 | consumed samples: 9228800 | consumed tokens: 18900582400 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.353899E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.672 | TFLOPs: 30.99 | 7: iteration 36060/ 115203 | consumed samples: 9231360 | consumed tokens: 18905825280 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.361331E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.237 | TFLOPs: 31.28 | 7: iteration 36070/ 115203 | consumed samples: 9233920 | consumed tokens: 18911068160 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.347116E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.610 | TFLOPs: 31.25 | 7: iteration 36080/ 115203 | consumed samples: 9236480 | consumed tokens: 18916311040 | elapsed time per iteration (s): 0.43 | learning rate: 1.615E-04 | global batch size: 256 | lm loss: 2.329993E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.490 | TFLOPs: 31.09 | 7: iteration 36090/ 115203 | consumed samples: 9239040 | consumed tokens: 18921553920 | elapsed time per iteration (s): 0.42 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.324206E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.383 | TFLOPs: 31.71 | 7: iteration 36100/ 115203 | consumed samples: 9241600 | consumed tokens: 18926796800 | elapsed time per iteration (s): 0.43 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.371409E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.052 | TFLOPs: 31.22 | 7: iteration 36110/ 115203 | consumed samples: 9244160 | consumed tokens: 18932039680 | elapsed time per iteration (s): 0.43 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.349298E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.956 | TFLOPs: 31.06 | 7: iteration 36120/ 115203 | consumed samples: 9246720 | consumed tokens: 18937282560 | elapsed time per iteration (s): 0.44 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.336089E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.487 | TFLOPs: 30.88 | 7: iteration 36130/ 115203 | consumed samples: 9249280 | consumed tokens: 18942525440 | elapsed time per iteration (s): 0.43 | learning rate: 1.614E-04 | global batch size: 256 | lm loss: 2.317431E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.292 | TFLOPs: 31.08 | 7: iteration 36140/ 115203 | consumed samples: 9251840 | consumed tokens: 18947768320 | elapsed time per iteration (s): 0.43 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.322703E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.121 | TFLOPs: 31.33 | 7: iteration 36150/ 115203 | consumed samples: 9254400 | consumed tokens: 18953011200 | elapsed time per iteration (s): 0.43 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.375843E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.872 | TFLOPs: 30.95 | 7: iteration 36160/ 115203 | consumed samples: 9256960 | consumed tokens: 18958254080 | elapsed time per iteration (s): 0.43 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.368119E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.948 | TFLOPs: 31.32 | 7: iteration 36170/ 115203 | consumed samples: 9259520 | consumed tokens: 18963496960 | elapsed time per iteration (s): 0.45 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.356956E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.790 | TFLOPs: 29.84 | 7: iteration 36180/ 115203 | consumed samples: 9262080 | consumed tokens: 18968739840 | elapsed time per iteration (s): 0.43 | learning rate: 1.613E-04 | global batch size: 256 | lm loss: 2.341969E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.187 | TFLOPs: 31.60 | 7: iteration 36190/ 115203 | consumed samples: 9264640 | consumed tokens: 18973982720 | elapsed time per iteration (s): 0.44 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.341702E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.355 | TFLOPs: 30.82 | 7: iteration 36200/ 115203 | consumed samples: 9267200 | consumed tokens: 18979225600 | elapsed time per iteration (s): 0.45 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.357910E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.702 | TFLOPs: 30.10 | 7: iteration 36210/ 115203 | consumed samples: 9269760 | consumed tokens: 18984468480 | elapsed time per iteration (s): 0.43 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.327909E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.900 | TFLOPs: 30.90 | 7: iteration 36220/ 115203 | consumed samples: 9272320 | consumed tokens: 18989711360 | elapsed time per iteration (s): 0.44 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.365486E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.517 | TFLOPs: 30.25 | 7: iteration 36230/ 115203 | consumed samples: 9274880 | consumed tokens: 18994954240 | elapsed time per iteration (s): 0.43 | learning rate: 1.612E-04 | global batch size: 256 | lm loss: 2.356573E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.818 | TFLOPs: 31.47 | 7: iteration 36240/ 115203 | consumed samples: 9277440 | consumed tokens: 19000197120 | elapsed time per iteration (s): 0.44 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.357858E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.202 | TFLOPs: 30.86 | 7: iteration 36250/ 115203 | consumed samples: 9280000 | consumed tokens: 19005440000 | elapsed time per iteration (s): 0.43 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.328699E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.325 | TFLOPs: 31.50 | 7: iteration 36260/ 115203 | consumed samples: 9282560 | consumed tokens: 19010682880 | elapsed time per iteration (s): 0.43 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.318930E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.513 | TFLOPs: 31.35 | 7: iteration 36270/ 115203 | consumed samples: 9285120 | consumed tokens: 19015925760 | elapsed time per iteration (s): 0.43 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.358304E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.567 | TFLOPs: 30.88 | 7: iteration 36280/ 115203 | consumed samples: 9287680 | consumed tokens: 19021168640 | elapsed time per iteration (s): 0.44 | learning rate: 1.611E-04 | global batch size: 256 | lm loss: 2.331041E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.285 | TFLOPs: 30.71 | 7: iteration 36290/ 115203 | consumed samples: 9290240 | consumed tokens: 19026411520 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.361324E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.412 | TFLOPs: 31.71 | 7: iteration 36300/ 115203 | consumed samples: 9292800 | consumed tokens: 19031654400 | elapsed time per iteration (s): 0.44 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.357646E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.433 | TFLOPs: 30.45 | 7: iteration 36310/ 115203 | consumed samples: 9295360 | consumed tokens: 19036897280 | elapsed time per iteration (s): 0.43 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.354746E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.768 | TFLOPs: 31.36 | 7: iteration 36320/ 115203 | consumed samples: 9297920 | consumed tokens: 19042140160 | elapsed time per iteration (s): 0.43 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.329745E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.976 | TFLOPs: 31.43 | 7: iteration 36330/ 115203 | consumed samples: 9300480 | consumed tokens: 19047383040 | elapsed time per iteration (s): 0.42 | learning rate: 1.610E-04 | global batch size: 256 | lm loss: 2.312195E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.627 | TFLOPs: 31.67 | 7: iteration 36340/ 115203 | consumed samples: 9303040 | consumed tokens: 19052625920 | elapsed time per iteration (s): 0.43 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.346242E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.919 | TFLOPs: 31.37 | 7: iteration 36350/ 115203 | consumed samples: 9305600 | consumed tokens: 19057868800 | elapsed time per iteration (s): 0.44 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.314772E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.038 | TFLOPs: 30.38 | 7: iteration 36360/ 115203 | consumed samples: 9308160 | consumed tokens: 19063111680 | elapsed time per iteration (s): 0.43 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.351962E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.444 | TFLOPs: 30.93 | 7: iteration 36370/ 115203 | consumed samples: 9310720 | consumed tokens: 19068354560 | elapsed time per iteration (s): 0.45 | learning rate: 1.609E-04 | global batch size: 256 | lm loss: 2.342742E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.800 | TFLOPs: 30.16 | 7: iteration 36380/ 115203 | consumed samples: 9313280 | consumed tokens: 19073597440 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.355070E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.935 | TFLOPs: 31.06 | 7: iteration 36390/ 115203 | consumed samples: 9315840 | consumed tokens: 19078840320 | elapsed time per iteration (s): 0.44 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.335270E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.018 | TFLOPs: 30.59 | 7: iteration 36400/ 115203 | consumed samples: 9318400 | consumed tokens: 19084083200 | elapsed time per iteration (s): 0.44 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.358567E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.387 | TFLOPs: 30.87 | 7: iteration 36410/ 115203 | consumed samples: 9320960 | consumed tokens: 19089326080 | elapsed time per iteration (s): 0.45 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.325005E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.091 | TFLOPs: 29.75 | 7: iteration 36420/ 115203 | consumed samples: 9323520 | consumed tokens: 19094568960 | elapsed time per iteration (s): 0.43 | learning rate: 1.608E-04 | global batch size: 256 | lm loss: 2.327499E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.780 | TFLOPs: 31.36 | 7: iteration 36430/ 115203 | consumed samples: 9326080 | consumed tokens: 19099811840 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.339300E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.690 | TFLOPs: 30.73 | 7: iteration 36440/ 115203 | consumed samples: 9328640 | consumed tokens: 19105054720 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.323475E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.668 | TFLOPs: 30.73 | 7: iteration 36450/ 115203 | consumed samples: 9331200 | consumed tokens: 19110297600 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.357111E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.367 | TFLOPs: 30.71 | 7: iteration 36460/ 115203 | consumed samples: 9333760 | consumed tokens: 19115540480 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.337275E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.624 | TFLOPs: 30.36 | 7: iteration 36470/ 115203 | consumed samples: 9336320 | consumed tokens: 19120783360 | elapsed time per iteration (s): 0.44 | learning rate: 1.607E-04 | global batch size: 256 | lm loss: 2.325914E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.237 | TFLOPs: 30.65 | 7: iteration 36480/ 115203 | consumed samples: 9338880 | consumed tokens: 19126026240 | elapsed time per iteration (s): 0.44 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.346809E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.398 | TFLOPs: 30.71 | 7: iteration 36490/ 115203 | consumed samples: 9341440 | consumed tokens: 19131269120 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.359882E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.472 | TFLOPs: 31.45 | 7: iteration 36500/ 115203 | consumed samples: 9344000 | consumed tokens: 19136512000 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.312730E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.908 | TFLOPs: 31.21 | 7: iteration 36510/ 115203 | consumed samples: 9346560 | consumed tokens: 19141754880 | elapsed time per iteration (s): 0.43 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.349312E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.942 | TFLOPs: 31.01 | 7: iteration 36520/ 115203 | consumed samples: 9349120 | consumed tokens: 19146997760 | elapsed time per iteration (s): 0.44 | learning rate: 1.606E-04 | global batch size: 256 | lm loss: 2.310218E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.897 | TFLOPs: 30.85 | 7: iteration 36530/ 115203 | consumed samples: 9351680 | consumed tokens: 19152240640 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.343298E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.724 | TFLOPs: 31.41 | 7: iteration 36540/ 115203 | consumed samples: 9354240 | consumed tokens: 19157483520 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.341192E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.617 | TFLOPs: 31.36 | 7: iteration 36550/ 115203 | consumed samples: 9356800 | consumed tokens: 19162726400 | elapsed time per iteration (s): 0.44 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.337427E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.543 | TFLOPs: 30.83 | 7: iteration 36560/ 115203 | consumed samples: 9359360 | consumed tokens: 19167969280 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.351599E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.117 | TFLOPs: 30.91 | 7: iteration 36570/ 115203 | consumed samples: 9361920 | consumed tokens: 19173212160 | elapsed time per iteration (s): 0.43 | learning rate: 1.605E-04 | global batch size: 256 | lm loss: 2.323741E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.089 | TFLOPs: 31.17 | 7: iteration 36580/ 115203 | consumed samples: 9364480 | consumed tokens: 19178455040 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.314010E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.518 | TFLOPs: 31.30 | 7: iteration 36590/ 115203 | consumed samples: 9367040 | consumed tokens: 19183697920 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.326813E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.930 | TFLOPs: 31.48 | 7: iteration 36600/ 115203 | consumed samples: 9369600 | consumed tokens: 19188940800 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.297440E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.957 | TFLOPs: 31.11 | 7: iteration 36610/ 115203 | consumed samples: 9372160 | consumed tokens: 19194183680 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.371195E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.220 | TFLOPs: 31.07 | 7: iteration 36620/ 115203 | consumed samples: 9374720 | consumed tokens: 19199426560 | elapsed time per iteration (s): 0.43 | learning rate: 1.604E-04 | global batch size: 256 | lm loss: 2.341343E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.078 | TFLOPs: 31.28 | 7: iteration 36630/ 115203 | consumed samples: 9377280 | consumed tokens: 19204669440 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.290815E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.721 | TFLOPs: 31.20 | 7: iteration 36640/ 115203 | consumed samples: 9379840 | consumed tokens: 19209912320 | elapsed time per iteration (s): 0.44 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.339329E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.817 | TFLOPs: 30.74 | 7: iteration 36650/ 115203 | consumed samples: 9382400 | consumed tokens: 19215155200 | elapsed time per iteration (s): 0.45 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.319683E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.439 | TFLOPs: 29.83 | 7: iteration 36660/ 115203 | consumed samples: 9384960 | consumed tokens: 19220398080 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.338338E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.426 | TFLOPs: 31.08 | 7: iteration 36670/ 115203 | consumed samples: 9387520 | consumed tokens: 19225640960 | elapsed time per iteration (s): 0.43 | learning rate: 1.603E-04 | global batch size: 256 | lm loss: 2.353462E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.428 | TFLOPs: 31.08 | 7: iteration 36680/ 115203 | consumed samples: 9390080 | consumed tokens: 19230883840 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.340550E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.221 | TFLOPs: 31.60 | 7: iteration 36690/ 115203 | consumed samples: 9392640 | consumed tokens: 19236126720 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.335317E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.005 | TFLOPs: 30.90 | 7: iteration 36700/ 115203 | consumed samples: 9395200 | consumed tokens: 19241369600 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.333760E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.209 | TFLOPs: 31.18 | 7: iteration 36710/ 115203 | consumed samples: 9397760 | consumed tokens: 19246612480 | elapsed time per iteration (s): 0.43 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.324159E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.766 | TFLOPs: 31.26 | 7: iteration 36720/ 115203 | consumed samples: 9400320 | consumed tokens: 19251855360 | elapsed time per iteration (s): 0.44 | learning rate: 1.602E-04 | global batch size: 256 | lm loss: 2.322440E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.343 | TFLOPs: 30.87 | 7: iteration 36730/ 115203 | consumed samples: 9402880 | consumed tokens: 19257098240 | elapsed time per iteration (s): 0.44 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.343281E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.691 | TFLOPs: 30.57 | 7: iteration 36740/ 115203 | consumed samples: 9405440 | consumed tokens: 19262341120 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.354655E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.691 | TFLOPs: 31.52 | 7: iteration 36750/ 115203 | consumed samples: 9408000 | consumed tokens: 19267584000 | elapsed time per iteration (s): 0.43 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.327718E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.702 | TFLOPs: 30.94 | 7: iteration 36760/ 115203 | consumed samples: 9410560 | consumed tokens: 19272826880 | elapsed time per iteration (s): 0.44 | learning rate: 1.601E-04 | global batch size: 256 | lm loss: 2.340283E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.978 | TFLOPs: 30.48 | 7: iteration 36770/ 115203 | consumed samples: 9413120 | consumed tokens: 19278069760 | elapsed time per iteration (s): 0.42 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.330326E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.480 | TFLOPs: 31.82 | 7: iteration 36780/ 115203 | consumed samples: 9415680 | consumed tokens: 19283312640 | elapsed time per iteration (s): 0.42 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.363651E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.986 | TFLOPs: 31.69 | 7: iteration 36790/ 115203 | consumed samples: 9418240 | consumed tokens: 19288555520 | elapsed time per iteration (s): 0.43 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.348940E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.960 | TFLOPs: 31.53 | 7: iteration 36800/ 115203 | consumed samples: 9420800 | consumed tokens: 19293798400 | elapsed time per iteration (s): 0.43 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.322046E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.152 | TFLOPs: 31.07 | 7: iteration 36810/ 115203 | consumed samples: 9423360 | consumed tokens: 19299041280 | elapsed time per iteration (s): 0.44 | learning rate: 1.600E-04 | global batch size: 256 | lm loss: 2.341455E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.317 | TFLOPs: 30.66 | 7: iteration 36820/ 115203 | consumed samples: 9425920 | consumed tokens: 19304284160 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.349144E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.959 | TFLOPs: 31.27 | 7: iteration 36830/ 115203 | consumed samples: 9428480 | consumed tokens: 19309527040 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.350191E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.964 | TFLOPs: 31.43 | 7: iteration 36840/ 115203 | consumed samples: 9431040 | consumed tokens: 19314769920 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.323641E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.497 | TFLOPs: 31.56 | 7: iteration 36850/ 115203 | consumed samples: 9433600 | consumed tokens: 19320012800 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.359076E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.102 | TFLOPs: 31.43 | 7: iteration 36860/ 115203 | consumed samples: 9436160 | consumed tokens: 19325255680 | elapsed time per iteration (s): 0.43 | learning rate: 1.599E-04 | global batch size: 256 | lm loss: 2.346931E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.065 | TFLOPs: 31.33 | 7: iteration 36870/ 115203 | consumed samples: 9438720 | consumed tokens: 19330498560 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.313919E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.454 | TFLOPs: 31.24 | 7: iteration 36880/ 115203 | consumed samples: 9441280 | consumed tokens: 19335741440 | elapsed time per iteration (s): 0.44 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.305054E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.074 | TFLOPs: 30.75 | 7: iteration 36890/ 115203 | consumed samples: 9443840 | consumed tokens: 19340984320 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.325755E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.630 | TFLOPs: 31.30 | 7: iteration 36900/ 115203 | consumed samples: 9446400 | consumed tokens: 19346227200 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.364356E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.442 | TFLOPs: 31.14 | 7: iteration 36910/ 115203 | consumed samples: 9448960 | consumed tokens: 19351470080 | elapsed time per iteration (s): 0.43 | learning rate: 1.598E-04 | global batch size: 256 | lm loss: 2.368352E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.787 | TFLOPs: 31.21 | 7: iteration 36920/ 115203 | consumed samples: 9451520 | consumed tokens: 19356712960 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.337348E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.536 | TFLOPs: 30.46 | 7: iteration 36930/ 115203 | consumed samples: 9454080 | consumed tokens: 19361955840 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.352454E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.180 | TFLOPs: 30.76 | 7: iteration 36940/ 115203 | consumed samples: 9456640 | consumed tokens: 19367198720 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.374347E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.861 | TFLOPs: 30.74 | 7: iteration 36950/ 115203 | consumed samples: 9459200 | consumed tokens: 19372441600 | elapsed time per iteration (s): 0.43 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.337159E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.926 | TFLOPs: 31.11 | 7: iteration 36960/ 115203 | consumed samples: 9461760 | consumed tokens: 19377684480 | elapsed time per iteration (s): 0.44 | learning rate: 1.597E-04 | global batch size: 256 | lm loss: 2.317611E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.235 | TFLOPs: 30.65 | 7: iteration 36970/ 115203 | consumed samples: 9464320 | consumed tokens: 19382927360 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.357934E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.323 | TFLOPs: 30.92 | 7: iteration 36980/ 115203 | consumed samples: 9466880 | consumed tokens: 19388170240 | elapsed time per iteration (s): 0.47 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.349029E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.530 | TFLOPs: 28.83 | 7: iteration 36990/ 115203 | consumed samples: 9469440 | consumed tokens: 19393413120 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.321145E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.946 | TFLOPs: 31.48 | 7: iteration 37000/ 115203 | consumed samples: 9472000 | consumed tokens: 19398656000 | elapsed time per iteration (s): 0.43 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.330885E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.752 | TFLOPs: 31.21 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 37000 | lm loss value: 2.327050E+00 | lm loss PPL: 1.024767E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 37000 to checkpoints_221m 0: [2022-11-28 17:23:54,750] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step37000 is begin to save! 0: [2022-11-28 17:23:54,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:23:54,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:23:54,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:23:54,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:23:54,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:23:54,915] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:23:54,915] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:23:54,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:23:54,938] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:23:54,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:23:54,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:23:54,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:23:54,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:23:55,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:23:55,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:23:55,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:23:55,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:23:55,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:23:55,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:23:55,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:23:55,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:23:55,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:23:55,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:23:55,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:23:55,136] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:23:55,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:23:55,161] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:23:55,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:23:55,185] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:23:55,210] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:23:55,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:23:55,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:23:55,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:23:55,257] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:23:55,257] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:23:55,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:23:55,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:23:55,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:23:55,306] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:23:55,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:23:55,311] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step37000/mp_rank_00_model_states.pt 0: [2022-11-28 17:23:55,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:23:55,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:23:55,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:23:55,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 6: [2022-11-28 17:23:55,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:23:55,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:23:55,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:23:55,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:23:55,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 2: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:23:55,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,413] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 4: [2022-11-28 17:23:55,413] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 1: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 7: [2022-11-28 17:23:55,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:23:55,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:23:55,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 3: [2022-11-28 17:23:55,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:23:55,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:23:55,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:23:55,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 5: [2022-11-28 17:23:55,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: [2022-11-28 17:23:55,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step37000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:23:55,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step37000 is ready now! 0: successfully saved checkpoint at iteration 37000 to checkpoints_221m 7: time (ms) | save-checkpoint: 714.39 7: iteration 37010/ 115203 | consumed samples: 9474560 | consumed tokens: 19403898880 | elapsed time per iteration (s): 0.52 | learning rate: 1.596E-04 | global batch size: 256 | lm loss: 2.354590E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.427 | TFLOPs: 25.68 | 7: iteration 37020/ 115203 | consumed samples: 9477120 | consumed tokens: 19409141760 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.313437E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.994 | TFLOPs: 31.32 | 7: iteration 37030/ 115203 | consumed samples: 9479680 | consumed tokens: 19414384640 | elapsed time per iteration (s): 0.42 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.355050E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.375 | TFLOPs: 31.76 | 7: iteration 37040/ 115203 | consumed samples: 9482240 | consumed tokens: 19419627520 | elapsed time per iteration (s): 0.44 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.328636E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.516 | TFLOPs: 30.83 | 7: iteration 37050/ 115203 | consumed samples: 9484800 | consumed tokens: 19424870400 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.352662E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.558 | TFLOPs: 31.20 | 7: iteration 37060/ 115203 | consumed samples: 9487360 | consumed tokens: 19430113280 | elapsed time per iteration (s): 0.43 | learning rate: 1.595E-04 | global batch size: 256 | lm loss: 2.328886E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.210 | TFLOPs: 31.18 | 7: iteration 37070/ 115203 | consumed samples: 9489920 | consumed tokens: 19435356160 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.329675E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.093 | TFLOPs: 31.17 | 7: iteration 37080/ 115203 | consumed samples: 9492480 | consumed tokens: 19440599040 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.331203E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.154 | TFLOPs: 31.17 | 7: iteration 37090/ 115203 | consumed samples: 9495040 | consumed tokens: 19445841920 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.352735E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.850 | TFLOPs: 31.32 | 7: iteration 37100/ 115203 | consumed samples: 9497600 | consumed tokens: 19451084800 | elapsed time per iteration (s): 0.43 | learning rate: 1.594E-04 | global batch size: 256 | lm loss: 2.328155E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.692 | TFLOPs: 31.15 | 7: iteration 37110/ 115203 | consumed samples: 9500160 | consumed tokens: 19456327680 | elapsed time per iteration (s): 0.44 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.356655E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.625 | TFLOPs: 30.57 | 7: iteration 37120/ 115203 | consumed samples: 9502720 | consumed tokens: 19461570560 | elapsed time per iteration (s): 0.42 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.314663E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.975 | TFLOPs: 31.64 | 7: iteration 37130/ 115203 | consumed samples: 9505280 | consumed tokens: 19466813440 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.376685E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.806 | TFLOPs: 31.52 | 7: iteration 37140/ 115203 | consumed samples: 9507840 | consumed tokens: 19472056320 | elapsed time per iteration (s): 0.43 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.326557E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.452 | TFLOPs: 31.14 | 7: iteration 37150/ 115203 | consumed samples: 9510400 | consumed tokens: 19477299200 | elapsed time per iteration (s): 0.45 | learning rate: 1.593E-04 | global batch size: 256 | lm loss: 2.340443E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.378 | TFLOPs: 30.14 | 7: iteration 37160/ 115203 | consumed samples: 9512960 | consumed tokens: 19482542080 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.352288E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.592 | TFLOPs: 31.30 | 7: iteration 37170/ 115203 | consumed samples: 9515520 | consumed tokens: 19487784960 | elapsed time per iteration (s): 0.59 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.314921E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 430.309 | TFLOPs: 22.58 | 7: iteration 37180/ 115203 | consumed samples: 9518080 | consumed tokens: 19493027840 | elapsed time per iteration (s): 0.44 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.337268E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.640 | TFLOPs: 30.41 | 7: iteration 37190/ 115203 | consumed samples: 9520640 | consumed tokens: 19498270720 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.331843E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.881 | TFLOPs: 30.95 | 7: iteration 37200/ 115203 | consumed samples: 9523200 | consumed tokens: 19503513600 | elapsed time per iteration (s): 0.43 | learning rate: 1.592E-04 | global batch size: 256 | lm loss: 2.321773E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.850 | TFLOPs: 31.26 | 7: iteration 37210/ 115203 | consumed samples: 9525760 | consumed tokens: 19508756480 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.332895E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.289 | TFLOPs: 30.97 | 7: iteration 37220/ 115203 | consumed samples: 9528320 | consumed tokens: 19513999360 | elapsed time per iteration (s): 0.44 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.348630E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.987 | TFLOPs: 30.69 | 7: iteration 37230/ 115203 | consumed samples: 9530880 | consumed tokens: 19519242240 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.354285E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.557 | TFLOPs: 31.35 | 7: iteration 37240/ 115203 | consumed samples: 9533440 | consumed tokens: 19524485120 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.308047E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.542 | TFLOPs: 31.04 | 7: iteration 37250/ 115203 | consumed samples: 9536000 | consumed tokens: 19529728000 | elapsed time per iteration (s): 0.43 | learning rate: 1.591E-04 | global batch size: 256 | lm loss: 2.345145E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.701 | TFLOPs: 30.94 | 7: iteration 37260/ 115203 | consumed samples: 9538560 | consumed tokens: 19534970880 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.320825E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.273 | TFLOPs: 31.29 | 7: iteration 37270/ 115203 | consumed samples: 9541120 | consumed tokens: 19540213760 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.362385E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.001 | TFLOPs: 31.53 | 7: iteration 37280/ 115203 | consumed samples: 9543680 | consumed tokens: 19545456640 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.345353E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.149 | TFLOPs: 31.59 | 7: iteration 37290/ 115203 | consumed samples: 9546240 | consumed tokens: 19550699520 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.339454E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.627 | TFLOPs: 31.36 | 7: iteration 37300/ 115203 | consumed samples: 9548800 | consumed tokens: 19555942400 | elapsed time per iteration (s): 0.43 | learning rate: 1.590E-04 | global batch size: 256 | lm loss: 2.333241E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.711 | TFLOPs: 31.31 | 7: iteration 37310/ 115203 | consumed samples: 9551360 | consumed tokens: 19561185280 | elapsed time per iteration (s): 0.44 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.337765E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.309 | TFLOPs: 30.29 | 7: iteration 37320/ 115203 | consumed samples: 9553920 | consumed tokens: 19566428160 | elapsed time per iteration (s): 0.42 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.336458E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.558 | TFLOPs: 31.88 | 7: iteration 37330/ 115203 | consumed samples: 9556480 | consumed tokens: 19571671040 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.335836E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.409 | TFLOPs: 31.14 | 7: iteration 37340/ 115203 | consumed samples: 9559040 | consumed tokens: 19576913920 | elapsed time per iteration (s): 0.43 | learning rate: 1.589E-04 | global batch size: 256 | lm loss: 2.321578E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.078 | TFLOPs: 31.43 | 7: iteration 37350/ 115203 | consumed samples: 9561600 | consumed tokens: 19582156800 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.337132E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.063 | TFLOPs: 31.33 | 7: iteration 37360/ 115203 | consumed samples: 9564160 | consumed tokens: 19587399680 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.347704E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.682 | TFLOPs: 31.20 | 7: iteration 37370/ 115203 | consumed samples: 9566720 | consumed tokens: 19592642560 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.323738E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.812 | TFLOPs: 31.58 | 7: iteration 37380/ 115203 | consumed samples: 9569280 | consumed tokens: 19597885440 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.373177E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.959 | TFLOPs: 30.90 | 7: iteration 37390/ 115203 | consumed samples: 9571840 | consumed tokens: 19603128320 | elapsed time per iteration (s): 0.43 | learning rate: 1.588E-04 | global batch size: 256 | lm loss: 2.324091E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.704 | TFLOPs: 31.36 | 7: iteration 37400/ 115203 | consumed samples: 9574400 | consumed tokens: 19608371200 | elapsed time per iteration (s): 0.43 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.325215E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.874 | TFLOPs: 31.47 | 7: iteration 37410/ 115203 | consumed samples: 9576960 | consumed tokens: 19613614080 | elapsed time per iteration (s): 0.44 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.297930E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.100 | TFLOPs: 30.86 | 7: iteration 37420/ 115203 | consumed samples: 9579520 | consumed tokens: 19618856960 | elapsed time per iteration (s): 0.44 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.316048E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.543 | TFLOPs: 30.83 | 7: iteration 37430/ 115203 | consumed samples: 9582080 | consumed tokens: 19624099840 | elapsed time per iteration (s): 0.45 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.329235E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.158 | TFLOPs: 29.97 | 7: iteration 37440/ 115203 | consumed samples: 9584640 | consumed tokens: 19629342720 | elapsed time per iteration (s): 0.42 | learning rate: 1.587E-04 | global batch size: 256 | lm loss: 2.317496E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.015 | TFLOPs: 31.74 | 7: iteration 37450/ 115203 | consumed samples: 9587200 | consumed tokens: 19634585600 | elapsed time per iteration (s): 0.45 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.344806E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.227 | TFLOPs: 30.08 | 7: iteration 37460/ 115203 | consumed samples: 9589760 | consumed tokens: 19639828480 | elapsed time per iteration (s): 0.42 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.343685E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.930 | TFLOPs: 31.69 | 7: iteration 37470/ 115203 | consumed samples: 9592320 | consumed tokens: 19645071360 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.359111E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.900 | TFLOPs: 31.32 | 7: iteration 37480/ 115203 | consumed samples: 9594880 | consumed tokens: 19650314240 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.333700E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.598 | TFLOPs: 31.41 | 7: iteration 37490/ 115203 | consumed samples: 9597440 | consumed tokens: 19655557120 | elapsed time per iteration (s): 0.43 | learning rate: 1.586E-04 | global batch size: 256 | lm loss: 2.343100E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.972 | TFLOPs: 31.27 | 7: iteration 37500/ 115203 | consumed samples: 9600000 | consumed tokens: 19660800000 | elapsed time per iteration (s): 0.44 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.354731E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.528 | TFLOPs: 30.41 | 7: iteration 37510/ 115203 | consumed samples: 9602560 | consumed tokens: 19666042880 | elapsed time per iteration (s): 0.45 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.344851E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.765 | TFLOPs: 30.16 | 7: iteration 37520/ 115203 | consumed samples: 9605120 | consumed tokens: 19671285760 | elapsed time per iteration (s): 0.44 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.306707E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.697 | TFLOPs: 30.57 | 7: iteration 37530/ 115203 | consumed samples: 9607680 | consumed tokens: 19676528640 | elapsed time per iteration (s): 0.42 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.335549E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.049 | TFLOPs: 31.64 | 7: iteration 37540/ 115203 | consumed samples: 9610240 | consumed tokens: 19681771520 | elapsed time per iteration (s): 0.42 | learning rate: 1.585E-04 | global batch size: 256 | lm loss: 2.271249E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.604 | TFLOPs: 31.78 | 7: iteration 37550/ 115203 | consumed samples: 9612800 | consumed tokens: 19687014400 | elapsed time per iteration (s): 0.44 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.319082E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.306 | TFLOPs: 30.82 | 7: iteration 37560/ 115203 | consumed samples: 9615360 | consumed tokens: 19692257280 | elapsed time per iteration (s): 0.43 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.298133E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.648 | TFLOPs: 31.10 | 7: iteration 37570/ 115203 | consumed samples: 9617920 | consumed tokens: 19697500160 | elapsed time per iteration (s): 0.44 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.334572E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.475 | TFLOPs: 30.88 | 7: iteration 37580/ 115203 | consumed samples: 9620480 | consumed tokens: 19702743040 | elapsed time per iteration (s): 0.42 | learning rate: 1.584E-04 | global batch size: 256 | lm loss: 2.366654E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.299 | TFLOPs: 32.13 | 7: iteration 37590/ 115203 | consumed samples: 9623040 | consumed tokens: 19707985920 | elapsed time per iteration (s): 0.43 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.316739E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.108 | TFLOPs: 31.07 | 7: iteration 37600/ 115203 | consumed samples: 9625600 | consumed tokens: 19713228800 | elapsed time per iteration (s): 0.43 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.356551E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.038 | TFLOPs: 31.22 | 7: iteration 37610/ 115203 | consumed samples: 9628160 | consumed tokens: 19718471680 | elapsed time per iteration (s): 0.42 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.337476E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.448 | TFLOPs: 31.61 | 7: iteration 37620/ 115203 | consumed samples: 9630720 | consumed tokens: 19723714560 | elapsed time per iteration (s): 0.43 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.359814E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.969 | TFLOPs: 31.58 | 7: iteration 37630/ 115203 | consumed samples: 9633280 | consumed tokens: 19728957440 | elapsed time per iteration (s): 0.43 | learning rate: 1.583E-04 | global batch size: 256 | lm loss: 2.316649E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.284 | TFLOPs: 31.23 | 7: iteration 37640/ 115203 | consumed samples: 9635840 | consumed tokens: 19734200320 | elapsed time per iteration (s): 0.42 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.328743E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.537 | TFLOPs: 31.82 | 7: iteration 37650/ 115203 | consumed samples: 9638400 | consumed tokens: 19739443200 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.333503E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.130 | TFLOPs: 31.12 | 7: iteration 37660/ 115203 | consumed samples: 9640960 | consumed tokens: 19744686080 | elapsed time per iteration (s): 0.42 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.317565E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.172 | TFLOPs: 31.91 | 7: iteration 37670/ 115203 | consumed samples: 9643520 | consumed tokens: 19749928960 | elapsed time per iteration (s): 0.43 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.310319E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.065 | TFLOPs: 31.43 | 7: iteration 37680/ 115203 | consumed samples: 9646080 | consumed tokens: 19755171840 | elapsed time per iteration (s): 0.44 | learning rate: 1.582E-04 | global batch size: 256 | lm loss: 2.330746E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.562 | TFLOPs: 30.62 | 7: iteration 37690/ 115203 | consumed samples: 9648640 | consumed tokens: 19760414720 | elapsed time per iteration (s): 0.42 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.300236E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.726 | TFLOPs: 31.62 | 7: iteration 37700/ 115203 | consumed samples: 9651200 | consumed tokens: 19765657600 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.371665E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.514 | TFLOPs: 30.88 | 7: iteration 37710/ 115203 | consumed samples: 9653760 | consumed tokens: 19770900480 | elapsed time per iteration (s): 0.42 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.372409E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.270 | TFLOPs: 31.86 | 7: iteration 37720/ 115203 | consumed samples: 9656320 | consumed tokens: 19776143360 | elapsed time per iteration (s): 0.42 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.362470E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.835 | TFLOPs: 31.73 | 7: iteration 37730/ 115203 | consumed samples: 9658880 | consumed tokens: 19781386240 | elapsed time per iteration (s): 0.43 | learning rate: 1.581E-04 | global batch size: 256 | lm loss: 2.337729E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.138 | TFLOPs: 31.17 | 7: iteration 37740/ 115203 | consumed samples: 9661440 | consumed tokens: 19786629120 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.343015E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.573 | TFLOPs: 31.88 | 7: iteration 37750/ 115203 | consumed samples: 9664000 | consumed tokens: 19791872000 | elapsed time per iteration (s): 0.42 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.334847E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.542 | TFLOPs: 31.82 | 7: iteration 37760/ 115203 | consumed samples: 9666560 | consumed tokens: 19797114880 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.327061E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.605 | TFLOPs: 31.20 | 7: iteration 37770/ 115203 | consumed samples: 9669120 | consumed tokens: 19802357760 | elapsed time per iteration (s): 0.43 | learning rate: 1.580E-04 | global batch size: 256 | lm loss: 2.329117E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.023 | TFLOPs: 31.53 | 7: iteration 37780/ 115203 | consumed samples: 9671680 | consumed tokens: 19807600640 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.338860E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.620 | TFLOPs: 31.25 | 7: iteration 37790/ 115203 | consumed samples: 9674240 | consumed tokens: 19812843520 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.324179E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.292 | TFLOPs: 31.18 | 7: iteration 37800/ 115203 | consumed samples: 9676800 | consumed tokens: 19818086400 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.314233E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.323 | TFLOPs: 31.50 | 7: iteration 37810/ 115203 | consumed samples: 9679360 | consumed tokens: 19823329280 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.295039E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.816 | TFLOPs: 31.37 | 7: iteration 37820/ 115203 | consumed samples: 9681920 | consumed tokens: 19828572160 | elapsed time per iteration (s): 0.43 | learning rate: 1.579E-04 | global batch size: 256 | lm loss: 2.321715E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.146 | TFLOPs: 31.23 | 7: iteration 37830/ 115203 | consumed samples: 9684480 | consumed tokens: 19833815040 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.361864E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.855 | TFLOPs: 31.21 | 7: iteration 37840/ 115203 | consumed samples: 9687040 | consumed tokens: 19839057920 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.311094E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.791 | TFLOPs: 31.21 | 7: iteration 37850/ 115203 | consumed samples: 9689600 | consumed tokens: 19844300800 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.332676E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.219 | TFLOPs: 31.34 | 7: iteration 37860/ 115203 | consumed samples: 9692160 | consumed tokens: 19849543680 | elapsed time per iteration (s): 0.42 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.353039E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.794 | TFLOPs: 31.68 | 7: iteration 37870/ 115203 | consumed samples: 9694720 | consumed tokens: 19854786560 | elapsed time per iteration (s): 0.43 | learning rate: 1.578E-04 | global batch size: 256 | lm loss: 2.341781E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.853 | TFLOPs: 31.37 | 7: iteration 37880/ 115203 | consumed samples: 9697280 | consumed tokens: 19860029440 | elapsed time per iteration (s): 0.42 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.338711E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.743 | TFLOPs: 32.04 | 7: iteration 37890/ 115203 | consumed samples: 9699840 | consumed tokens: 19865272320 | elapsed time per iteration (s): 0.43 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.347634E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.491 | TFLOPs: 31.24 | 7: iteration 37900/ 115203 | consumed samples: 9702400 | consumed tokens: 19870515200 | elapsed time per iteration (s): 0.43 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.312291E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.394 | TFLOPs: 31.34 | 7: iteration 37910/ 115203 | consumed samples: 9704960 | consumed tokens: 19875758080 | elapsed time per iteration (s): 0.42 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.324003E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.456 | TFLOPs: 31.71 | 7: iteration 37920/ 115203 | consumed samples: 9707520 | consumed tokens: 19881000960 | elapsed time per iteration (s): 0.44 | learning rate: 1.577E-04 | global batch size: 256 | lm loss: 2.365026E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.448 | TFLOPs: 30.77 | 7: iteration 37930/ 115203 | consumed samples: 9710080 | consumed tokens: 19886243840 | elapsed time per iteration (s): 0.43 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.316980E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.313 | TFLOPs: 31.50 | 7: iteration 37940/ 115203 | consumed samples: 9712640 | consumed tokens: 19891486720 | elapsed time per iteration (s): 0.43 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.334557E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.263 | TFLOPs: 31.44 | 7: iteration 37950/ 115203 | consumed samples: 9715200 | consumed tokens: 19896729600 | elapsed time per iteration (s): 0.44 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.348783E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.743 | TFLOPs: 30.63 | 7: iteration 37960/ 115203 | consumed samples: 9717760 | consumed tokens: 19901972480 | elapsed time per iteration (s): 0.43 | learning rate: 1.576E-04 | global batch size: 256 | lm loss: 2.347906E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.407 | TFLOPs: 31.14 | 7: iteration 37970/ 115203 | consumed samples: 9720320 | consumed tokens: 19907215360 | elapsed time per iteration (s): 0.43 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.334194E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.379 | TFLOPs: 31.08 | 7: iteration 37980/ 115203 | consumed samples: 9722880 | consumed tokens: 19912458240 | elapsed time per iteration (s): 0.43 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.322163E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.346 | TFLOPs: 31.03 | 7: iteration 37990/ 115203 | consumed samples: 9725440 | consumed tokens: 19917701120 | elapsed time per iteration (s): 0.42 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.339559E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.319 | TFLOPs: 31.87 | 0: [2022-11-28 17:31:07,281] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=0, lr=[0.00015748667481842792, 0.00015748667481842792, 0.00015748667481842792], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 38000/ 115203 | consumed samples: 9728000 | consumed tokens: 19922944000 | elapsed time per iteration (s): 0.43 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.336456E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.133 | TFLOPs: 31.33 | 0: steps: 38000 loss: 2.4168 iter time (s): 0.430 samples/sec: 594.910 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 38000 | lm loss value: 2.327537E+00 | lm loss PPL: 1.025266E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 38000 to checkpoints_221m 0: [2022-11-28 17:31:07,469] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step38000 is begin to save! 0: [2022-11-28 17:31:07,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:31:07,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:31:07,610] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:31:07,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:31:07,633] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:31:07,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:31:07,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:31:07,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:31:07,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:31:07,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:31:07,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:31:07,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:31:07,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:31:07,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:31:07,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:31:07,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:31:07,780] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:31:07,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:31:07,805] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:31:07,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:31:07,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:31:07,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:31:07,853] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:31:07,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:31:07,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:31:07,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:31:07,901] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:31:07,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:31:07,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:31:07,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:31:07,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:31:07,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:31:07,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:31:07,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:31:08,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:31:08,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:31:08,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:31:08,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:31:08,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:31:08,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:31:08,053] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step38000/mp_rank_00_model_states.pt 0: [2022-11-28 17:31:08,053] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:31:08,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:31:08,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:31:08,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,133] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,133] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 2: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 1: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:31:08,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:31:08,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 4: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:31:08,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 6: [2022-11-28 17:31:08,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,143] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,143] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,144] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,144] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,145] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,145] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 7: [2022-11-28 17:31:08,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:31:08,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:31:08,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:31:08,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:31:08,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 3: [2022-11-28 17:31:08,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: [2022-11-28 17:31:08,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:31:08,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,315] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,315] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:31:08,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,317] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step38000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 5: [2022-11-28 17:31:08,317] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step38000 is ready now! 0: successfully saved checkpoint at iteration 38000 to checkpoints_221m 7: time (ms) | save-checkpoint: 866.09 7: iteration 38010/ 115203 | consumed samples: 9730560 | consumed tokens: 19928186880 | elapsed time per iteration (s): 0.54 | learning rate: 1.575E-04 | global batch size: 256 | lm loss: 2.332421E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 474.249 | TFLOPs: 24.88 | 7: iteration 38020/ 115203 | consumed samples: 9733120 | consumed tokens: 19933429760 | elapsed time per iteration (s): 0.45 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.292516E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.138 | TFLOPs: 30.02 | 7: iteration 38030/ 115203 | consumed samples: 9735680 | consumed tokens: 19938672640 | elapsed time per iteration (s): 0.43 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.334094E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.004 | TFLOPs: 31.32 | 7: iteration 38040/ 115203 | consumed samples: 9738240 | consumed tokens: 19943915520 | elapsed time per iteration (s): 0.44 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.302380E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.825 | TFLOPs: 30.79 | 7: iteration 38050/ 115203 | consumed samples: 9740800 | consumed tokens: 19949158400 | elapsed time per iteration (s): 0.43 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.340958E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.660 | TFLOPs: 31.04 | 7: iteration 38060/ 115203 | consumed samples: 9743360 | consumed tokens: 19954401280 | elapsed time per iteration (s): 0.43 | learning rate: 1.574E-04 | global batch size: 256 | lm loss: 2.334647E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.623 | TFLOPs: 31.51 | 7: iteration 38070/ 115203 | consumed samples: 9745920 | consumed tokens: 19959644160 | elapsed time per iteration (s): 0.44 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.320088E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.454 | TFLOPs: 30.77 | 7: iteration 38080/ 115203 | consumed samples: 9748480 | consumed tokens: 19964887040 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.317841E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.620 | TFLOPs: 31.51 | 7: iteration 38090/ 115203 | consumed samples: 9751040 | consumed tokens: 19970129920 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.318447E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.404 | TFLOPs: 31.34 | 7: iteration 38100/ 115203 | consumed samples: 9753600 | consumed tokens: 19975372800 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.325136E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.054 | TFLOPs: 31.48 | 7: iteration 38110/ 115203 | consumed samples: 9756160 | consumed tokens: 19980615680 | elapsed time per iteration (s): 0.43 | learning rate: 1.573E-04 | global batch size: 256 | lm loss: 2.340234E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.015 | TFLOPs: 30.96 | 7: iteration 38120/ 115203 | consumed samples: 9758720 | consumed tokens: 19985858560 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.344059E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.178 | TFLOPs: 31.12 | 7: iteration 38130/ 115203 | consumed samples: 9761280 | consumed tokens: 19991101440 | elapsed time per iteration (s): 0.42 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.352657E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.271 | TFLOPs: 31.71 | 7: iteration 38140/ 115203 | consumed samples: 9763840 | consumed tokens: 19996344320 | elapsed time per iteration (s): 0.43 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.357111E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.704 | TFLOPs: 31.36 | 7: iteration 38150/ 115203 | consumed samples: 9766400 | consumed tokens: 20001587200 | elapsed time per iteration (s): 0.53 | learning rate: 1.572E-04 | global batch size: 256 | lm loss: 2.343920E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 479.538 | TFLOPs: 25.16 | 7: iteration 38160/ 115203 | consumed samples: 9768960 | consumed tokens: 20006830080 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.342501E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.432 | TFLOPs: 31.61 | 7: iteration 38170/ 115203 | consumed samples: 9771520 | consumed tokens: 20012072960 | elapsed time per iteration (s): 0.42 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.371786E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.114 | TFLOPs: 31.64 | 7: iteration 38180/ 115203 | consumed samples: 9774080 | consumed tokens: 20017315840 | elapsed time per iteration (s): 0.43 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.340808E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.025 | TFLOPs: 31.22 | 7: iteration 38190/ 115203 | consumed samples: 9776640 | consumed tokens: 20022558720 | elapsed time per iteration (s): 0.44 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.347832E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.151 | TFLOPs: 30.49 | 7: iteration 38200/ 115203 | consumed samples: 9779200 | consumed tokens: 20027801600 | elapsed time per iteration (s): 0.43 | learning rate: 1.571E-04 | global batch size: 256 | lm loss: 2.332121E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.122 | TFLOPs: 31.12 | 7: iteration 38210/ 115203 | consumed samples: 9781760 | consumed tokens: 20033044480 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.358639E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.063 | TFLOPs: 31.12 | 7: iteration 38220/ 115203 | consumed samples: 9784320 | consumed tokens: 20038287360 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.320197E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.207 | TFLOPs: 31.44 | 7: iteration 38230/ 115203 | consumed samples: 9786880 | consumed tokens: 20043530240 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.331879E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.392 | TFLOPs: 31.24 | 7: iteration 38240/ 115203 | consumed samples: 9789440 | consumed tokens: 20048773120 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.381765E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.849 | TFLOPs: 31.32 | 7: iteration 38250/ 115203 | consumed samples: 9792000 | consumed tokens: 20054016000 | elapsed time per iteration (s): 0.43 | learning rate: 1.570E-04 | global batch size: 256 | lm loss: 2.351455E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.125 | TFLOPs: 31.23 | 7: iteration 38260/ 115203 | consumed samples: 9794560 | consumed tokens: 20059258880 | elapsed time per iteration (s): 0.43 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.310111E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.801 | TFLOPs: 31.37 | 7: iteration 38270/ 115203 | consumed samples: 9797120 | consumed tokens: 20064501760 | elapsed time per iteration (s): 0.43 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.346266E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.427 | TFLOPs: 31.14 | 7: iteration 38280/ 115203 | consumed samples: 9799680 | consumed tokens: 20069744640 | elapsed time per iteration (s): 0.44 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.327074E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.246 | TFLOPs: 30.71 | 7: iteration 38290/ 115203 | consumed samples: 9802240 | consumed tokens: 20074987520 | elapsed time per iteration (s): 0.44 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.339079E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.017 | TFLOPs: 30.64 | 7: iteration 38300/ 115203 | consumed samples: 9804800 | consumed tokens: 20080230400 | elapsed time per iteration (s): 0.43 | learning rate: 1.569E-04 | global batch size: 256 | lm loss: 2.328884E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.372 | TFLOPs: 31.50 | 7: iteration 38310/ 115203 | consumed samples: 9807360 | consumed tokens: 20085473280 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.345103E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.951 | TFLOPs: 31.48 | 7: iteration 38320/ 115203 | consumed samples: 9809920 | consumed tokens: 20090716160 | elapsed time per iteration (s): 0.42 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.321703E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.988 | TFLOPs: 31.69 | 7: iteration 38330/ 115203 | consumed samples: 9812480 | consumed tokens: 20095959040 | elapsed time per iteration (s): 0.44 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.291964E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.357 | TFLOPs: 30.66 | 7: iteration 38340/ 115203 | consumed samples: 9815040 | consumed tokens: 20101201920 | elapsed time per iteration (s): 0.43 | learning rate: 1.568E-04 | global batch size: 256 | lm loss: 2.326925E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.465 | TFLOPs: 31.24 | 7: iteration 38350/ 115203 | consumed samples: 9817600 | consumed tokens: 20106444800 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.347639E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.631 | TFLOPs: 31.25 | 7: iteration 38360/ 115203 | consumed samples: 9820160 | consumed tokens: 20111687680 | elapsed time per iteration (s): 0.44 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.325485E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.094 | TFLOPs: 30.23 | 7: iteration 38370/ 115203 | consumed samples: 9822720 | consumed tokens: 20116930560 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.314570E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.480 | TFLOPs: 31.30 | 7: iteration 38380/ 115203 | consumed samples: 9825280 | consumed tokens: 20122173440 | elapsed time per iteration (s): 0.44 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.343408E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.936 | TFLOPs: 30.74 | 7: iteration 38390/ 115203 | consumed samples: 9827840 | consumed tokens: 20127416320 | elapsed time per iteration (s): 0.43 | learning rate: 1.567E-04 | global batch size: 256 | lm loss: 2.344895E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.507 | TFLOPs: 31.56 | 7: iteration 38400/ 115203 | consumed samples: 9830400 | consumed tokens: 20132659200 | elapsed time per iteration (s): 0.43 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.334280E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.321 | TFLOPs: 31.13 | 7: iteration 38410/ 115203 | consumed samples: 9832960 | consumed tokens: 20137902080 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.323960E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.995 | TFLOPs: 31.69 | 7: iteration 38420/ 115203 | consumed samples: 9835520 | consumed tokens: 20143144960 | elapsed time per iteration (s): 0.42 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.359276E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.976 | TFLOPs: 31.79 | 7: iteration 38430/ 115203 | consumed samples: 9838080 | consumed tokens: 20148387840 | elapsed time per iteration (s): 0.44 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.329575E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.258 | TFLOPs: 30.81 | 7: iteration 38440/ 115203 | consumed samples: 9840640 | consumed tokens: 20153630720 | elapsed time per iteration (s): 0.43 | learning rate: 1.566E-04 | global batch size: 256 | lm loss: 2.339060E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.397 | TFLOPs: 31.34 | 7: iteration 38450/ 115203 | consumed samples: 9843200 | consumed tokens: 20158873600 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.340336E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.953 | TFLOPs: 31.06 | 7: iteration 38460/ 115203 | consumed samples: 9845760 | consumed tokens: 20164116480 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.318765E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.069 | TFLOPs: 31.27 | 7: iteration 38470/ 115203 | consumed samples: 9848320 | consumed tokens: 20169359360 | elapsed time per iteration (s): 0.42 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.300920E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.903 | TFLOPs: 31.63 | 7: iteration 38480/ 115203 | consumed samples: 9850880 | consumed tokens: 20174602240 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.343858E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.816 | TFLOPs: 31.10 | 7: iteration 38490/ 115203 | consumed samples: 9853440 | consumed tokens: 20179845120 | elapsed time per iteration (s): 0.43 | learning rate: 1.565E-04 | global batch size: 256 | lm loss: 2.335889E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.138 | TFLOPs: 31.17 | 7: iteration 38500/ 115203 | consumed samples: 9856000 | consumed tokens: 20185088000 | elapsed time per iteration (s): 0.43 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.338044E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.214 | TFLOPs: 31.02 | 7: iteration 38510/ 115203 | consumed samples: 9858560 | consumed tokens: 20190330880 | elapsed time per iteration (s): 0.44 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.335880E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.411 | TFLOPs: 30.72 | 7: iteration 38520/ 115203 | consumed samples: 9861120 | consumed tokens: 20195573760 | elapsed time per iteration (s): 0.43 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.356512E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.059 | TFLOPs: 31.59 | 7: iteration 38530/ 115203 | consumed samples: 9863680 | consumed tokens: 20200816640 | elapsed time per iteration (s): 0.43 | learning rate: 1.564E-04 | global batch size: 256 | lm loss: 2.368120E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.701 | TFLOPs: 31.26 | 7: iteration 38540/ 115203 | consumed samples: 9866240 | consumed tokens: 20206059520 | elapsed time per iteration (s): 0.43 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.307592E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.802 | TFLOPs: 31.21 | 7: iteration 38550/ 115203 | consumed samples: 9868800 | consumed tokens: 20211302400 | elapsed time per iteration (s): 0.44 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.343116E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.856 | TFLOPs: 30.53 | 7: iteration 38560/ 115203 | consumed samples: 9871360 | consumed tokens: 20216545280 | elapsed time per iteration (s): 0.44 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.308518E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.941 | TFLOPs: 30.22 | 7: iteration 38570/ 115203 | consumed samples: 9873920 | consumed tokens: 20221788160 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.339073E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.631 | TFLOPs: 31.72 | 7: iteration 38580/ 115203 | consumed samples: 9876480 | consumed tokens: 20227031040 | elapsed time per iteration (s): 0.42 | learning rate: 1.563E-04 | global batch size: 256 | lm loss: 2.302034E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.115 | TFLOPs: 31.70 | 7: iteration 38590/ 115203 | consumed samples: 9879040 | consumed tokens: 20232273920 | elapsed time per iteration (s): 0.43 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.335581E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.906 | TFLOPs: 31.32 | 7: iteration 38600/ 115203 | consumed samples: 9881600 | consumed tokens: 20237516800 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.337672E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.998 | TFLOPs: 31.95 | 7: iteration 38610/ 115203 | consumed samples: 9884160 | consumed tokens: 20242759680 | elapsed time per iteration (s): 0.42 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.356174E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.082 | TFLOPs: 31.64 | 7: iteration 38620/ 115203 | consumed samples: 9886720 | consumed tokens: 20248002560 | elapsed time per iteration (s): 0.43 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.299119E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.561 | TFLOPs: 31.14 | 7: iteration 38630/ 115203 | consumed samples: 9889280 | consumed tokens: 20253245440 | elapsed time per iteration (s): 0.43 | learning rate: 1.562E-04 | global batch size: 256 | lm loss: 2.327228E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.619 | TFLOPs: 31.15 | 7: iteration 38640/ 115203 | consumed samples: 9891840 | consumed tokens: 20258488320 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.333541E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.811 | TFLOPs: 31.21 | 7: iteration 38650/ 115203 | consumed samples: 9894400 | consumed tokens: 20263731200 | elapsed time per iteration (s): 0.44 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.326284E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.885 | TFLOPs: 30.85 | 7: iteration 38660/ 115203 | consumed samples: 9896960 | consumed tokens: 20268974080 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.323553E+00 | grad norm: 0.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.650 | TFLOPs: 31.25 | 7: iteration 38670/ 115203 | consumed samples: 9899520 | consumed tokens: 20274216960 | elapsed time per iteration (s): 0.43 | learning rate: 1.561E-04 | global batch size: 256 | lm loss: 2.330247E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.345 | TFLOPs: 31.50 | 7: iteration 38680/ 115203 | consumed samples: 9902080 | consumed tokens: 20279459840 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.347015E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.299 | TFLOPs: 31.86 | 7: iteration 38690/ 115203 | consumed samples: 9904640 | consumed tokens: 20284702720 | elapsed time per iteration (s): 0.42 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.344749E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.956 | TFLOPs: 31.85 | 7: iteration 38700/ 115203 | consumed samples: 9907200 | consumed tokens: 20289945600 | elapsed time per iteration (s): 0.44 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.326249E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.953 | TFLOPs: 30.69 | 7: iteration 38710/ 115203 | consumed samples: 9909760 | consumed tokens: 20295188480 | elapsed time per iteration (s): 0.44 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.341164E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.555 | TFLOPs: 30.25 | 7: iteration 38720/ 115203 | consumed samples: 9912320 | consumed tokens: 20300431360 | elapsed time per iteration (s): 0.44 | learning rate: 1.560E-04 | global batch size: 256 | lm loss: 2.362234E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.861 | TFLOPs: 30.42 | 7: iteration 38730/ 115203 | consumed samples: 9914880 | consumed tokens: 20305674240 | elapsed time per iteration (s): 0.42 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.317794E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.416 | TFLOPs: 32.03 | 7: iteration 38740/ 115203 | consumed samples: 9917440 | consumed tokens: 20310917120 | elapsed time per iteration (s): 0.43 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.294979E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.070 | TFLOPs: 31.06 | 7: iteration 38750/ 115203 | consumed samples: 9920000 | consumed tokens: 20316160000 | elapsed time per iteration (s): 0.43 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.322045E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.018 | TFLOPs: 31.53 | 7: iteration 38760/ 115203 | consumed samples: 9922560 | consumed tokens: 20321402880 | elapsed time per iteration (s): 0.44 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.313952E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.619 | TFLOPs: 30.57 | 7: iteration 38770/ 115203 | consumed samples: 9925120 | consumed tokens: 20326645760 | elapsed time per iteration (s): 0.43 | learning rate: 1.559E-04 | global batch size: 256 | lm loss: 2.306356E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.888 | TFLOPs: 31.06 | 7: iteration 38780/ 115203 | consumed samples: 9927680 | consumed tokens: 20331888640 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.318526E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.772 | TFLOPs: 31.68 | 7: iteration 38790/ 115203 | consumed samples: 9930240 | consumed tokens: 20337131520 | elapsed time per iteration (s): 0.43 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.334386E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.794 | TFLOPs: 31.00 | 7: iteration 38800/ 115203 | consumed samples: 9932800 | consumed tokens: 20342374400 | elapsed time per iteration (s): 0.42 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.329580E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.317 | TFLOPs: 31.71 | 7: iteration 38810/ 115203 | consumed samples: 9935360 | consumed tokens: 20347617280 | elapsed time per iteration (s): 0.43 | learning rate: 1.558E-04 | global batch size: 256 | lm loss: 2.324181E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.615 | TFLOPs: 30.99 | 7: iteration 38820/ 115203 | consumed samples: 9937920 | consumed tokens: 20352860160 | elapsed time per iteration (s): 0.44 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.337985E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.675 | TFLOPs: 30.83 | 7: iteration 38830/ 115203 | consumed samples: 9940480 | consumed tokens: 20358103040 | elapsed time per iteration (s): 0.43 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.341128E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.771 | TFLOPs: 31.52 | 7: iteration 38840/ 115203 | consumed samples: 9943040 | consumed tokens: 20363345920 | elapsed time per iteration (s): 0.43 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.323806E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.518 | TFLOPs: 31.09 | 7: iteration 38850/ 115203 | consumed samples: 9945600 | consumed tokens: 20368588800 | elapsed time per iteration (s): 0.43 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.312000E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.366 | TFLOPs: 31.55 | 7: iteration 38860/ 115203 | consumed samples: 9948160 | consumed tokens: 20373831680 | elapsed time per iteration (s): 0.43 | learning rate: 1.557E-04 | global batch size: 256 | lm loss: 2.320116E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.652 | TFLOPs: 31.36 | 7: iteration 38870/ 115203 | consumed samples: 9950720 | consumed tokens: 20379074560 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.351780E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.970 | TFLOPs: 31.11 | 7: iteration 38880/ 115203 | consumed samples: 9953280 | consumed tokens: 20384317440 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.348622E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.227 | TFLOPs: 31.44 | 7: iteration 38890/ 115203 | consumed samples: 9955840 | consumed tokens: 20389560320 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.330704E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.421 | TFLOPs: 31.14 | 7: iteration 38900/ 115203 | consumed samples: 9958400 | consumed tokens: 20394803200 | elapsed time per iteration (s): 0.43 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.324283E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.028 | TFLOPs: 31.01 | 7: iteration 38910/ 115203 | consumed samples: 9960960 | consumed tokens: 20400046080 | elapsed time per iteration (s): 0.45 | learning rate: 1.556E-04 | global batch size: 256 | lm loss: 2.326755E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.091 | TFLOPs: 29.70 | 7: iteration 38920/ 115203 | consumed samples: 9963520 | consumed tokens: 20405288960 | elapsed time per iteration (s): 0.43 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.335676E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.639 | TFLOPs: 31.41 | 7: iteration 38930/ 115203 | consumed samples: 9966080 | consumed tokens: 20410531840 | elapsed time per iteration (s): 0.42 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.334517E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.851 | TFLOPs: 31.74 | 7: iteration 38940/ 115203 | consumed samples: 9968640 | consumed tokens: 20415774720 | elapsed time per iteration (s): 0.43 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.326144E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.851 | TFLOPs: 31.47 | 7: iteration 38950/ 115203 | consumed samples: 9971200 | consumed tokens: 20421017600 | elapsed time per iteration (s): 0.43 | learning rate: 1.555E-04 | global batch size: 256 | lm loss: 2.332934E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.400 | TFLOPs: 31.55 | 7: iteration 38960/ 115203 | consumed samples: 9973760 | consumed tokens: 20426260480 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.326429E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.874 | TFLOPs: 31.11 | 7: iteration 38970/ 115203 | consumed samples: 9976320 | consumed tokens: 20431503360 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.341743E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.415 | TFLOPs: 31.19 | 7: iteration 38980/ 115203 | consumed samples: 9978880 | consumed tokens: 20436746240 | elapsed time per iteration (s): 0.44 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.327079E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.593 | TFLOPs: 30.67 | 7: iteration 38990/ 115203 | consumed samples: 9981440 | consumed tokens: 20441989120 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.317716E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.555 | TFLOPs: 31.56 | 7: iteration 39000/ 115203 | consumed samples: 9984000 | consumed tokens: 20447232000 | elapsed time per iteration (s): 0.43 | learning rate: 1.554E-04 | global batch size: 256 | lm loss: 2.327861E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.957 | TFLOPs: 31.06 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 39000 | lm loss value: 2.231519E+00 | lm loss PPL: 9.313999E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 39000 to checkpoints_221m 0: [2022-11-28 17:38:20,254] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step39000 is begin to save! 0: [2022-11-28 17:38:20,262] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:38:20,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:38:20,366] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:38:20,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:38:20,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:38:20,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:38:20,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:38:20,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:38:20,436] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:38:20,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:38:20,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:38:20,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:38:20,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:38:20,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:38:20,507] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:38:20,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:38:20,530] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:38:20,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:38:20,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:38:20,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:38:20,578] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:38:20,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:38:20,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:38:20,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:38:20,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:38:20,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:38:20,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:38:20,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:38:20,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:38:20,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:38:20,697] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:38:20,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:38:20,721] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:38:20,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:38:20,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:38:20,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:38:20,769] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:38:20,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:38:20,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:38:20,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:38:20,797] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step39000/mp_rank_00_model_states.pt 0: [2022-11-28 17:38:20,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:38:20,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:38:20,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:38:20,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 4: [2022-11-28 17:38:20,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:38:20,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:38:20,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 1: [2022-11-28 17:38:20,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:38:20,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:38:20,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:38:20,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:38:20,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:38:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 5: [2022-11-28 17:38:20,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:38:20,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:38:20,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:38:20,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 3: [2022-11-28 17:38:20,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: [2022-11-28 17:38:20,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:38:20,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 2: [2022-11-28 17:38:21,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:38:21,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step39000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:38:21,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step39000 is ready now! 0: successfully saved checkpoint at iteration 39000 to checkpoints_221m 7: time (ms) | save-checkpoint: 785.93 7: iteration 39010/ 115203 | consumed samples: 9986560 | consumed tokens: 20452474880 | elapsed time per iteration (s): 0.52 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.375748E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 490.949 | TFLOPs: 25.76 | 7: iteration 39020/ 115203 | consumed samples: 9989120 | consumed tokens: 20457717760 | elapsed time per iteration (s): 0.43 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.358460E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.084 | TFLOPs: 30.96 | 7: iteration 39030/ 115203 | consumed samples: 9991680 | consumed tokens: 20462960640 | elapsed time per iteration (s): 0.43 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.293962E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.924 | TFLOPs: 31.27 | 7: iteration 39040/ 115203 | consumed samples: 9994240 | consumed tokens: 20468203520 | elapsed time per iteration (s): 0.43 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.344968E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.330 | TFLOPs: 31.45 | 7: iteration 39050/ 115203 | consumed samples: 9996800 | consumed tokens: 20473446400 | elapsed time per iteration (s): 0.44 | learning rate: 1.553E-04 | global batch size: 256 | lm loss: 2.320206E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.387 | TFLOPs: 30.77 | 7: iteration 39060/ 115203 | consumed samples: 9999360 | consumed tokens: 20478689280 | elapsed time per iteration (s): 0.43 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.338462E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.785 | TFLOPs: 31.36 | 7: iteration 39070/ 115203 | consumed samples: 10001920 | consumed tokens: 20483932160 | elapsed time per iteration (s): 0.44 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.292823E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.893 | TFLOPs: 30.85 | 7: iteration 39080/ 115203 | consumed samples: 10004480 | consumed tokens: 20489175040 | elapsed time per iteration (s): 0.43 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.349933E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.431 | TFLOPs: 31.08 | 7: iteration 39090/ 115203 | consumed samples: 10007040 | consumed tokens: 20494417920 | elapsed time per iteration (s): 0.44 | learning rate: 1.552E-04 | global batch size: 256 | lm loss: 2.336908E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.133 | TFLOPs: 30.44 | 7: iteration 39100/ 115203 | consumed samples: 10009600 | consumed tokens: 20499660800 | elapsed time per iteration (s): 0.43 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.318753E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.691 | TFLOPs: 31.41 | 7: iteration 39110/ 115203 | consumed samples: 10012160 | consumed tokens: 20504903680 | elapsed time per iteration (s): 0.44 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.309866E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.465 | TFLOPs: 30.56 | 7: iteration 39120/ 115203 | consumed samples: 10014720 | consumed tokens: 20510146560 | elapsed time per iteration (s): 0.43 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.341918E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.493 | TFLOPs: 31.09 | 7: iteration 39130/ 115203 | consumed samples: 10017280 | consumed tokens: 20515389440 | elapsed time per iteration (s): 0.42 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.337033E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.663 | TFLOPs: 31.99 | 7: iteration 39140/ 115203 | consumed samples: 10019840 | consumed tokens: 20520632320 | elapsed time per iteration (s): 0.43 | learning rate: 1.551E-04 | global batch size: 256 | lm loss: 2.318098E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.827 | TFLOPs: 31.00 | 7: iteration 39150/ 115203 | consumed samples: 10022400 | consumed tokens: 20525875200 | elapsed time per iteration (s): 0.42 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.324956E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.978 | TFLOPs: 32.06 | 7: iteration 39160/ 115203 | consumed samples: 10024960 | consumed tokens: 20531118080 | elapsed time per iteration (s): 0.43 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.320219E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.594 | TFLOPs: 31.25 | 7: iteration 39170/ 115203 | consumed samples: 10027520 | consumed tokens: 20536360960 | elapsed time per iteration (s): 0.43 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.335737E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.424 | TFLOPs: 31.19 | 7: iteration 39180/ 115203 | consumed samples: 10030080 | consumed tokens: 20541603840 | elapsed time per iteration (s): 0.43 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.349811E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.005 | TFLOPs: 31.32 | 7: iteration 39190/ 115203 | consumed samples: 10032640 | consumed tokens: 20546846720 | elapsed time per iteration (s): 0.43 | learning rate: 1.550E-04 | global batch size: 256 | lm loss: 2.309093E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.285 | TFLOPs: 31.39 | 7: iteration 39200/ 115203 | consumed samples: 10035200 | consumed tokens: 20552089600 | elapsed time per iteration (s): 0.45 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.325960E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.021 | TFLOPs: 29.91 | 7: iteration 39210/ 115203 | consumed samples: 10037760 | consumed tokens: 20557332480 | elapsed time per iteration (s): 0.43 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.312632E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.848 | TFLOPs: 31.21 | 7: iteration 39220/ 115203 | consumed samples: 10040320 | consumed tokens: 20562575360 | elapsed time per iteration (s): 0.43 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.334777E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.360 | TFLOPs: 30.92 | 7: iteration 39230/ 115203 | consumed samples: 10042880 | consumed tokens: 20567818240 | elapsed time per iteration (s): 0.43 | learning rate: 1.549E-04 | global batch size: 256 | lm loss: 2.308599E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.114 | TFLOPs: 31.38 | 7: iteration 39240/ 115203 | consumed samples: 10045440 | consumed tokens: 20573061120 | elapsed time per iteration (s): 0.44 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.322411E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.673 | TFLOPs: 30.78 | 7: iteration 39250/ 115203 | consumed samples: 10048000 | consumed tokens: 20578304000 | elapsed time per iteration (s): 0.44 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.290497E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.266 | TFLOPs: 30.76 | 7: iteration 39260/ 115203 | consumed samples: 10050560 | consumed tokens: 20583546880 | elapsed time per iteration (s): 0.43 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.312685E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.420 | TFLOPs: 31.35 | 7: iteration 39270/ 115203 | consumed samples: 10053120 | consumed tokens: 20588789760 | elapsed time per iteration (s): 0.42 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.320507E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.111 | TFLOPs: 31.85 | 7: iteration 39280/ 115203 | consumed samples: 10055680 | consumed tokens: 20594032640 | elapsed time per iteration (s): 0.43 | learning rate: 1.548E-04 | global batch size: 256 | lm loss: 2.338939E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.966 | TFLOPs: 31.22 | 7: iteration 39290/ 115203 | consumed samples: 10058240 | consumed tokens: 20599275520 | elapsed time per iteration (s): 0.43 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.335555E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.817 | TFLOPs: 31.47 | 7: iteration 39300/ 115203 | consumed samples: 10060800 | consumed tokens: 20604518400 | elapsed time per iteration (s): 0.43 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.351556E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.556 | TFLOPs: 31.56 | 7: iteration 39310/ 115203 | consumed samples: 10063360 | consumed tokens: 20609761280 | elapsed time per iteration (s): 0.43 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.372604E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.710 | TFLOPs: 31.26 | 7: iteration 39320/ 115203 | consumed samples: 10065920 | consumed tokens: 20615004160 | elapsed time per iteration (s): 0.44 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.296659E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.011 | TFLOPs: 30.85 | 7: iteration 39330/ 115203 | consumed samples: 10068480 | consumed tokens: 20620247040 | elapsed time per iteration (s): 0.44 | learning rate: 1.547E-04 | global batch size: 256 | lm loss: 2.338026E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.894 | TFLOPs: 30.53 | 7: iteration 39340/ 115203 | consumed samples: 10071040 | consumed tokens: 20625489920 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.332019E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.547 | TFLOPs: 31.09 | 7: iteration 39350/ 115203 | consumed samples: 10073600 | consumed tokens: 20630732800 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.321526E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.158 | TFLOPs: 31.38 | 7: iteration 39360/ 115203 | consumed samples: 10076160 | consumed tokens: 20635975680 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.332255E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.773 | TFLOPs: 31.36 | 7: iteration 39370/ 115203 | consumed samples: 10078720 | consumed tokens: 20641218560 | elapsed time per iteration (s): 0.43 | learning rate: 1.546E-04 | global batch size: 256 | lm loss: 2.304156E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.288 | TFLOPs: 31.55 | 7: iteration 39380/ 115203 | consumed samples: 10081280 | consumed tokens: 20646461440 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.340360E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.390 | TFLOPs: 31.66 | 7: iteration 39390/ 115203 | consumed samples: 10083840 | consumed tokens: 20651704320 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.360512E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.141 | TFLOPs: 31.86 | 7: iteration 39400/ 115203 | consumed samples: 10086400 | consumed tokens: 20656947200 | elapsed time per iteration (s): 0.43 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.306067E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.191 | TFLOPs: 31.60 | 7: iteration 39410/ 115203 | consumed samples: 10088960 | consumed tokens: 20662190080 | elapsed time per iteration (s): 0.43 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.282251E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.648 | TFLOPs: 30.94 | 7: iteration 39420/ 115203 | consumed samples: 10091520 | consumed tokens: 20667432960 | elapsed time per iteration (s): 0.42 | learning rate: 1.545E-04 | global batch size: 256 | lm loss: 2.320647E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.466 | TFLOPs: 31.82 | 7: iteration 39430/ 115203 | consumed samples: 10094080 | consumed tokens: 20672675840 | elapsed time per iteration (s): 0.43 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.342086E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.424 | TFLOPs: 30.93 | 7: iteration 39440/ 115203 | consumed samples: 10096640 | consumed tokens: 20677918720 | elapsed time per iteration (s): 0.43 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.331297E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.901 | TFLOPs: 31.16 | 7: iteration 39450/ 115203 | consumed samples: 10099200 | consumed tokens: 20683161600 | elapsed time per iteration (s): 0.43 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.342269E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.080 | TFLOPs: 31.54 | 7: iteration 39460/ 115203 | consumed samples: 10101760 | consumed tokens: 20688404480 | elapsed time per iteration (s): 0.43 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.354594E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.864 | TFLOPs: 31.47 | 7: iteration 39470/ 115203 | consumed samples: 10104320 | consumed tokens: 20693647360 | elapsed time per iteration (s): 0.44 | learning rate: 1.544E-04 | global batch size: 256 | lm loss: 2.333088E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.092 | TFLOPs: 30.59 | 7: iteration 39480/ 115203 | consumed samples: 10106880 | consumed tokens: 20698890240 | elapsed time per iteration (s): 0.43 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.342554E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.589 | TFLOPs: 31.25 | 7: iteration 39490/ 115203 | consumed samples: 10109440 | consumed tokens: 20704133120 | elapsed time per iteration (s): 0.43 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.350190E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.447 | TFLOPs: 31.08 | 7: iteration 39500/ 115203 | consumed samples: 10112000 | consumed tokens: 20709376000 | elapsed time per iteration (s): 0.43 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.339496E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.705 | TFLOPs: 31.26 | 7: iteration 39510/ 115203 | consumed samples: 10114560 | consumed tokens: 20714618880 | elapsed time per iteration (s): 0.42 | learning rate: 1.543E-04 | global batch size: 256 | lm loss: 2.320762E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.791 | TFLOPs: 31.68 | 7: iteration 39520/ 115203 | consumed samples: 10117120 | consumed tokens: 20719861760 | elapsed time per iteration (s): 0.42 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.307574E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.677 | TFLOPs: 31.78 | 7: iteration 39530/ 115203 | consumed samples: 10119680 | consumed tokens: 20725104640 | elapsed time per iteration (s): 0.43 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.326679E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.604 | TFLOPs: 31.30 | 7: iteration 39540/ 115203 | consumed samples: 10122240 | consumed tokens: 20730347520 | elapsed time per iteration (s): 0.43 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.323773E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.236 | TFLOPs: 31.34 | 7: iteration 39550/ 115203 | consumed samples: 10124800 | consumed tokens: 20735590400 | elapsed time per iteration (s): 0.44 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.312086E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.236 | TFLOPs: 30.55 | 7: iteration 39560/ 115203 | consumed samples: 10127360 | consumed tokens: 20740833280 | elapsed time per iteration (s): 0.43 | learning rate: 1.542E-04 | global batch size: 256 | lm loss: 2.303336E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.331 | TFLOPs: 31.29 | 7: iteration 39570/ 115203 | consumed samples: 10129920 | consumed tokens: 20746076160 | elapsed time per iteration (s): 0.43 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.315197E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.273 | TFLOPs: 31.23 | 7: iteration 39580/ 115203 | consumed samples: 10132480 | consumed tokens: 20751319040 | elapsed time per iteration (s): 0.42 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.322231E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.791 | TFLOPs: 31.78 | 7: iteration 39590/ 115203 | consumed samples: 10135040 | consumed tokens: 20756561920 | elapsed time per iteration (s): 0.45 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.304129E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.232 | TFLOPs: 30.08 | 7: iteration 39600/ 115203 | consumed samples: 10137600 | consumed tokens: 20761804800 | elapsed time per iteration (s): 0.43 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.334036E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.133 | TFLOPs: 31.44 | 7: iteration 39610/ 115203 | consumed samples: 10140160 | consumed tokens: 20767047680 | elapsed time per iteration (s): 0.43 | learning rate: 1.541E-04 | global batch size: 256 | lm loss: 2.337933E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.101 | TFLOPs: 31.12 | 7: iteration 39620/ 115203 | consumed samples: 10142720 | consumed tokens: 20772290560 | elapsed time per iteration (s): 0.43 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.347336E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.535 | TFLOPs: 30.88 | 7: iteration 39630/ 115203 | consumed samples: 10145280 | consumed tokens: 20777533440 | elapsed time per iteration (s): 0.43 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.338152E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.034 | TFLOPs: 30.91 | 7: iteration 39640/ 115203 | consumed samples: 10147840 | consumed tokens: 20782776320 | elapsed time per iteration (s): 0.43 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.323785E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.731 | TFLOPs: 31.20 | 7: iteration 39650/ 115203 | consumed samples: 10150400 | consumed tokens: 20788019200 | elapsed time per iteration (s): 0.43 | learning rate: 1.540E-04 | global batch size: 256 | lm loss: 2.314497E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.645 | TFLOPs: 31.57 | 7: iteration 39660/ 115203 | consumed samples: 10152960 | consumed tokens: 20793262080 | elapsed time per iteration (s): 0.43 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.366622E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.578 | TFLOPs: 31.56 | 7: iteration 39670/ 115203 | consumed samples: 10155520 | consumed tokens: 20798504960 | elapsed time per iteration (s): 0.42 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.310332E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.104 | TFLOPs: 31.80 | 7: iteration 39680/ 115203 | consumed samples: 10158080 | consumed tokens: 20803747840 | elapsed time per iteration (s): 0.44 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.314229E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.413 | TFLOPs: 30.35 | 7: iteration 39690/ 115203 | consumed samples: 10160640 | consumed tokens: 20808990720 | elapsed time per iteration (s): 0.43 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.311946E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.439 | TFLOPs: 30.98 | 7: iteration 39700/ 115203 | consumed samples: 10163200 | consumed tokens: 20814233600 | elapsed time per iteration (s): 0.43 | learning rate: 1.539E-04 | global batch size: 256 | lm loss: 2.356472E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.800 | TFLOPs: 30.89 | 7: iteration 39710/ 115203 | consumed samples: 10165760 | consumed tokens: 20819476480 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.346099E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.580 | TFLOPs: 31.72 | 7: iteration 39720/ 115203 | consumed samples: 10168320 | consumed tokens: 20824719360 | elapsed time per iteration (s): 0.42 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.348963E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.300 | TFLOPs: 31.71 | 7: iteration 39730/ 115203 | consumed samples: 10170880 | consumed tokens: 20829962240 | elapsed time per iteration (s): 0.43 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.320551E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.766 | TFLOPs: 31.26 | 7: iteration 39740/ 115203 | consumed samples: 10173440 | consumed tokens: 20835205120 | elapsed time per iteration (s): 0.43 | learning rate: 1.538E-04 | global batch size: 256 | lm loss: 2.328208E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.422 | TFLOPs: 31.24 | 7: iteration 39750/ 115203 | consumed samples: 10176000 | consumed tokens: 20840448000 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.301941E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.130 | TFLOPs: 31.91 | 7: iteration 39760/ 115203 | consumed samples: 10178560 | consumed tokens: 20845690880 | elapsed time per iteration (s): 0.44 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.322412E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.396 | TFLOPs: 30.24 | 7: iteration 39770/ 115203 | consumed samples: 10181120 | consumed tokens: 20850933760 | elapsed time per iteration (s): 0.43 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.313988E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.632 | TFLOPs: 30.88 | 7: iteration 39780/ 115203 | consumed samples: 10183680 | consumed tokens: 20856176640 | elapsed time per iteration (s): 0.42 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.331592E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.715 | TFLOPs: 31.73 | 7: iteration 39790/ 115203 | consumed samples: 10186240 | consumed tokens: 20861419520 | elapsed time per iteration (s): 0.43 | learning rate: 1.537E-04 | global batch size: 256 | lm loss: 2.344351E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.008 | TFLOPs: 31.11 | 7: iteration 39800/ 115203 | consumed samples: 10188800 | consumed tokens: 20866662400 | elapsed time per iteration (s): 0.43 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.317011E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.721 | TFLOPs: 31.47 | 7: iteration 39810/ 115203 | consumed samples: 10191360 | consumed tokens: 20871905280 | elapsed time per iteration (s): 0.44 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.328431E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.103 | TFLOPs: 30.49 | 7: iteration 39820/ 115203 | consumed samples: 10193920 | consumed tokens: 20877148160 | elapsed time per iteration (s): 0.42 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.318471E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.016 | TFLOPs: 31.64 | 7: iteration 39830/ 115203 | consumed samples: 10196480 | consumed tokens: 20882391040 | elapsed time per iteration (s): 0.43 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.322855E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.560 | TFLOPs: 31.35 | 7: iteration 39840/ 115203 | consumed samples: 10199040 | consumed tokens: 20887633920 | elapsed time per iteration (s): 0.43 | learning rate: 1.536E-04 | global batch size: 256 | lm loss: 2.345312E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.634 | TFLOPs: 31.20 | 7: iteration 39850/ 115203 | consumed samples: 10201600 | consumed tokens: 20892876800 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.331677E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.492 | TFLOPs: 31.35 | 7: iteration 39860/ 115203 | consumed samples: 10204160 | consumed tokens: 20898119680 | elapsed time per iteration (s): 0.45 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.312825E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.364 | TFLOPs: 30.14 | 7: iteration 39870/ 115203 | consumed samples: 10206720 | consumed tokens: 20903362560 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.328402E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.388 | TFLOPs: 31.50 | 7: iteration 39880/ 115203 | consumed samples: 10209280 | consumed tokens: 20908605440 | elapsed time per iteration (s): 0.43 | learning rate: 1.535E-04 | global batch size: 256 | lm loss: 2.330769E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.878 | TFLOPs: 31.53 | 7: iteration 39890/ 115203 | consumed samples: 10211840 | consumed tokens: 20913848320 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.291256E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.979 | TFLOPs: 31.37 | 7: iteration 39900/ 115203 | consumed samples: 10214400 | consumed tokens: 20919091200 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.297109E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.027 | TFLOPs: 30.96 | 7: iteration 39910/ 115203 | consumed samples: 10216960 | consumed tokens: 20924334080 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.311089E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.991 | TFLOPs: 30.96 | 7: iteration 39920/ 115203 | consumed samples: 10219520 | consumed tokens: 20929576960 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.341297E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.847 | TFLOPs: 31.47 | 7: iteration 39930/ 115203 | consumed samples: 10222080 | consumed tokens: 20934819840 | elapsed time per iteration (s): 0.43 | learning rate: 1.534E-04 | global batch size: 256 | lm loss: 2.343177E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.335 | TFLOPs: 31.34 | 7: iteration 39940/ 115203 | consumed samples: 10224640 | consumed tokens: 20940062720 | elapsed time per iteration (s): 0.45 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.301932E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.653 | TFLOPs: 30.05 | 7: iteration 39950/ 115203 | consumed samples: 10227200 | consumed tokens: 20945305600 | elapsed time per iteration (s): 0.43 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.325897E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.079 | TFLOPs: 31.33 | 7: iteration 39960/ 115203 | consumed samples: 10229760 | consumed tokens: 20950548480 | elapsed time per iteration (s): 0.42 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.313698E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.128 | TFLOPs: 31.65 | 7: iteration 39970/ 115203 | consumed samples: 10232320 | consumed tokens: 20955791360 | elapsed time per iteration (s): 0.44 | learning rate: 1.533E-04 | global batch size: 256 | lm loss: 2.300059E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.638 | TFLOPs: 30.26 | 7: iteration 39980/ 115203 | consumed samples: 10234880 | consumed tokens: 20961034240 | elapsed time per iteration (s): 0.42 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.308182E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.591 | TFLOPs: 31.67 | 7: iteration 39990/ 115203 | consumed samples: 10237440 | consumed tokens: 20966277120 | elapsed time per iteration (s): 0.43 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.357360E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.826 | TFLOPs: 31.16 | 0: [2022-11-28 17:45:31,635] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=0, lr=[0.0001532049360643911, 0.0001532049360643911, 0.0001532049360643911], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: steps: 40000 loss: 2.2278 iter time (s): 0.429 samples/sec: 596.110 7: iteration 40000/ 115203 | consumed samples: 10240000 | consumed tokens: 20971520000 | elapsed time per iteration (s): 0.43 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.315532E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.974 | TFLOPs: 30.90 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 40000 | lm loss value: 2.229773E+00 | lm loss PPL: 9.297751E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 40000 to checkpoints_221m 0: [2022-11-28 17:45:31,810] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step40000 is begin to save! 0: [2022-11-28 17:45:31,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:45:31,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:45:31,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:45:32,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:45:32,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:45:32,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:45:32,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:45:32,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:45:32,062] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:45:32,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:45:32,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:45:32,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:45:32,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:45:32,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:45:32,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:45:32,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:45:32,182] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:45:32,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:45:32,206] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:45:32,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:45:32,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:45:32,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:45:32,252] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:45:32,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:45:32,274] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:45:32,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:45:32,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:45:32,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:45:32,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:45:32,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:45:32,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:45:32,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:45:32,366] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:45:32,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:45:32,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:45:32,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:45:32,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:45:32,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:45:32,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:45:32,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:45:32,439] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step40000/mp_rank_00_model_states.pt 0: [2022-11-28 17:45:32,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:45:32,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:45:32,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:45:32,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,519] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 2: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 4: [2022-11-28 17:45:32,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:45:32,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:45:32,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 1: [2022-11-28 17:45:32,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:45:32,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:45:32,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:45:32,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 7: [2022-11-28 17:45:32,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:45:32,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 5: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,531] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 3: [2022-11-28 17:45:32,531] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 6: [2022-11-28 17:45:32,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:45:32,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:45:32,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: [2022-11-28 17:45:32,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step40000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:45:32,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step40000 is ready now! 0: successfully saved checkpoint at iteration 40000 to checkpoints_221m 7: time (ms) | save-checkpoint: 768.35 7: iteration 40010/ 115203 | consumed samples: 10242560 | consumed tokens: 20976762880 | elapsed time per iteration (s): 0.53 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.355779E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 487.345 | TFLOPs: 25.57 | 7: iteration 40020/ 115203 | consumed samples: 10245120 | consumed tokens: 20982005760 | elapsed time per iteration (s): 0.43 | learning rate: 1.532E-04 | global batch size: 256 | lm loss: 2.351567E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.510 | TFLOPs: 30.93 | 7: iteration 40030/ 115203 | consumed samples: 10247680 | consumed tokens: 20987248640 | elapsed time per iteration (s): 0.43 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.349711E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.314 | TFLOPs: 31.29 | 7: iteration 40040/ 115203 | consumed samples: 10250240 | consumed tokens: 20992491520 | elapsed time per iteration (s): 0.42 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.329058E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.219 | TFLOPs: 32.07 | 7: iteration 40050/ 115203 | consumed samples: 10252800 | consumed tokens: 20997734400 | elapsed time per iteration (s): 0.44 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.339634E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.878 | TFLOPs: 30.58 | 7: iteration 40060/ 115203 | consumed samples: 10255360 | consumed tokens: 21002977280 | elapsed time per iteration (s): 0.43 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.323429E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.477 | TFLOPs: 31.14 | 7: iteration 40070/ 115203 | consumed samples: 10257920 | consumed tokens: 21008220160 | elapsed time per iteration (s): 0.43 | learning rate: 1.531E-04 | global batch size: 256 | lm loss: 2.315316E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.152 | TFLOPs: 31.44 | 7: iteration 40080/ 115203 | consumed samples: 10260480 | consumed tokens: 21013463040 | elapsed time per iteration (s): 0.43 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.303966E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.031 | TFLOPs: 31.27 | 7: iteration 40090/ 115203 | consumed samples: 10263040 | consumed tokens: 21018705920 | elapsed time per iteration (s): 0.43 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.366742E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.589 | TFLOPs: 31.20 | 7: iteration 40100/ 115203 | consumed samples: 10265600 | consumed tokens: 21023948800 | elapsed time per iteration (s): 0.43 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.310934E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.711 | TFLOPs: 31.31 | 7: iteration 40110/ 115203 | consumed samples: 10268160 | consumed tokens: 21029191680 | elapsed time per iteration (s): 0.43 | learning rate: 1.530E-04 | global batch size: 256 | lm loss: 2.338753E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.770 | TFLOPs: 31.42 | 7: iteration 40120/ 115203 | consumed samples: 10270720 | consumed tokens: 21034434560 | elapsed time per iteration (s): 0.42 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.317567E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.700 | TFLOPs: 31.62 | 7: iteration 40130/ 115203 | consumed samples: 10273280 | consumed tokens: 21039677440 | elapsed time per iteration (s): 0.43 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.313341E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.223 | TFLOPs: 31.18 | 7: iteration 40140/ 115203 | consumed samples: 10275840 | consumed tokens: 21044920320 | elapsed time per iteration (s): 0.52 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.306658E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 491.361 | TFLOPs: 25.78 | 7: iteration 40150/ 115203 | consumed samples: 10278400 | consumed tokens: 21050163200 | elapsed time per iteration (s): 0.47 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.314047E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 543.123 | TFLOPs: 28.50 | 7: iteration 40160/ 115203 | consumed samples: 10280960 | consumed tokens: 21055406080 | elapsed time per iteration (s): 0.43 | learning rate: 1.529E-04 | global batch size: 256 | lm loss: 2.311295E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.390 | TFLOPs: 31.45 | 7: iteration 40170/ 115203 | consumed samples: 10283520 | consumed tokens: 21060648960 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.307056E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.936 | TFLOPs: 31.32 | 7: iteration 40180/ 115203 | consumed samples: 10286080 | consumed tokens: 21065891840 | elapsed time per iteration (s): 0.42 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.348322E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.433 | TFLOPs: 31.82 | 7: iteration 40190/ 115203 | consumed samples: 10288640 | consumed tokens: 21071134720 | elapsed time per iteration (s): 0.44 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.321407E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.837 | TFLOPs: 30.48 | 7: iteration 40200/ 115203 | consumed samples: 10291200 | consumed tokens: 21076377600 | elapsed time per iteration (s): 0.43 | learning rate: 1.528E-04 | global batch size: 256 | lm loss: 2.337004E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.130 | TFLOPs: 31.17 | 7: iteration 40210/ 115203 | consumed samples: 10293760 | consumed tokens: 21081620480 | elapsed time per iteration (s): 0.43 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.358582E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.904 | TFLOPs: 31.21 | 7: iteration 40220/ 115203 | consumed samples: 10296320 | consumed tokens: 21086863360 | elapsed time per iteration (s): 0.43 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.308514E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.726 | TFLOPs: 31.57 | 7: iteration 40230/ 115203 | consumed samples: 10298880 | consumed tokens: 21092106240 | elapsed time per iteration (s): 0.43 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.304317E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.355 | TFLOPs: 31.45 | 7: iteration 40240/ 115203 | consumed samples: 10301440 | consumed tokens: 21097349120 | elapsed time per iteration (s): 0.46 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.331887E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.979 | TFLOPs: 29.28 | 7: iteration 40250/ 115203 | consumed samples: 10304000 | consumed tokens: 21102592000 | elapsed time per iteration (s): 0.43 | learning rate: 1.527E-04 | global batch size: 256 | lm loss: 2.400749E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.496 | TFLOPs: 31.09 | 7: iteration 40260/ 115203 | consumed samples: 10306560 | consumed tokens: 21107834880 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.335465E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.746 | TFLOPs: 31.15 | 7: iteration 40270/ 115203 | consumed samples: 10309120 | consumed tokens: 21113077760 | elapsed time per iteration (s): 0.44 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.334062E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.816 | TFLOPs: 30.74 | 7: iteration 40280/ 115203 | consumed samples: 10311680 | consumed tokens: 21118320640 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.328754E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.620 | TFLOPs: 31.46 | 7: iteration 40290/ 115203 | consumed samples: 10314240 | consumed tokens: 21123563520 | elapsed time per iteration (s): 0.43 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.335229E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.726 | TFLOPs: 31.41 | 7: iteration 40300/ 115203 | consumed samples: 10316800 | consumed tokens: 21128806400 | elapsed time per iteration (s): 0.44 | learning rate: 1.526E-04 | global batch size: 256 | lm loss: 2.325624E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.367 | TFLOPs: 30.71 | 7: iteration 40310/ 115203 | consumed samples: 10319360 | consumed tokens: 21134049280 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.334283E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.172 | TFLOPs: 31.75 | 7: iteration 40320/ 115203 | consumed samples: 10321920 | consumed tokens: 21139292160 | elapsed time per iteration (s): 0.42 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.347297E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.179 | TFLOPs: 31.86 | 7: iteration 40330/ 115203 | consumed samples: 10324480 | consumed tokens: 21144535040 | elapsed time per iteration (s): 0.43 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.350668E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.485 | TFLOPs: 30.98 | 7: iteration 40340/ 115203 | consumed samples: 10327040 | consumed tokens: 21149777920 | elapsed time per iteration (s): 0.44 | learning rate: 1.525E-04 | global batch size: 256 | lm loss: 2.316842E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.160 | TFLOPs: 30.60 | 7: iteration 40350/ 115203 | consumed samples: 10329600 | consumed tokens: 21155020800 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.319324E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.102 | TFLOPs: 31.49 | 7: iteration 40360/ 115203 | consumed samples: 10332160 | consumed tokens: 21160263680 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.328231E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.298 | TFLOPs: 31.60 | 7: iteration 40370/ 115203 | consumed samples: 10334720 | consumed tokens: 21165506560 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.337779E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.045 | TFLOPs: 31.27 | 7: iteration 40380/ 115203 | consumed samples: 10337280 | consumed tokens: 21170749440 | elapsed time per iteration (s): 0.43 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.303737E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.817 | TFLOPs: 31.21 | 7: iteration 40390/ 115203 | consumed samples: 10339840 | consumed tokens: 21175992320 | elapsed time per iteration (s): 0.42 | learning rate: 1.524E-04 | global batch size: 256 | lm loss: 2.333889E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.094 | TFLOPs: 31.64 | 7: iteration 40400/ 115203 | consumed samples: 10342400 | consumed tokens: 21181235200 | elapsed time per iteration (s): 0.44 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.330518E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.396 | TFLOPs: 30.71 | 7: iteration 40410/ 115203 | consumed samples: 10344960 | consumed tokens: 21186478080 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.331568E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.379 | TFLOPs: 31.45 | 7: iteration 40420/ 115203 | consumed samples: 10347520 | consumed tokens: 21191720960 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.341853E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.568 | TFLOPs: 31.35 | 7: iteration 40430/ 115203 | consumed samples: 10350080 | consumed tokens: 21196963840 | elapsed time per iteration (s): 0.43 | learning rate: 1.523E-04 | global batch size: 256 | lm loss: 2.347251E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.217 | TFLOPs: 31.60 | 7: iteration 40440/ 115203 | consumed samples: 10352640 | consumed tokens: 21202206720 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.318779E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.733 | TFLOPs: 31.31 | 7: iteration 40450/ 115203 | consumed samples: 10355200 | consumed tokens: 21207449600 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.279484E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.690 | TFLOPs: 31.15 | 7: iteration 40460/ 115203 | consumed samples: 10357760 | consumed tokens: 21212692480 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.315791E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.961 | TFLOPs: 31.27 | 7: iteration 40470/ 115203 | consumed samples: 10360320 | consumed tokens: 21217935360 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.329661E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.125 | TFLOPs: 31.02 | 7: iteration 40480/ 115203 | consumed samples: 10362880 | consumed tokens: 21223178240 | elapsed time per iteration (s): 0.43 | learning rate: 1.522E-04 | global batch size: 256 | lm loss: 2.371999E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.226 | TFLOPs: 31.28 | 7: iteration 40490/ 115203 | consumed samples: 10365440 | consumed tokens: 21228421120 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.314670E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.962 | TFLOPs: 31.58 | 7: iteration 40500/ 115203 | consumed samples: 10368000 | consumed tokens: 21233664000 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.332788E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.573 | TFLOPs: 31.51 | 7: iteration 40510/ 115203 | consumed samples: 10370560 | consumed tokens: 21238906880 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.289516E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.812 | TFLOPs: 31.52 | 7: iteration 40520/ 115203 | consumed samples: 10373120 | consumed tokens: 21244149760 | elapsed time per iteration (s): 0.43 | learning rate: 1.521E-04 | global batch size: 256 | lm loss: 2.352790E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.973 | TFLOPs: 31.37 | 7: iteration 40530/ 115203 | consumed samples: 10375680 | consumed tokens: 21249392640 | elapsed time per iteration (s): 0.42 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.367988E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.976 | TFLOPs: 32.00 | 7: iteration 40540/ 115203 | consumed samples: 10378240 | consumed tokens: 21254635520 | elapsed time per iteration (s): 0.42 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.342616E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.155 | TFLOPs: 31.65 | 7: iteration 40550/ 115203 | consumed samples: 10380800 | consumed tokens: 21259878400 | elapsed time per iteration (s): 0.42 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.332159E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.889 | TFLOPs: 31.95 | 7: iteration 40560/ 115203 | consumed samples: 10383360 | consumed tokens: 21265121280 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.346125E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.317 | TFLOPs: 31.18 | 7: iteration 40570/ 115203 | consumed samples: 10385920 | consumed tokens: 21270364160 | elapsed time per iteration (s): 0.43 | learning rate: 1.520E-04 | global batch size: 256 | lm loss: 2.343196E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.177 | TFLOPs: 31.07 | 7: iteration 40580/ 115203 | consumed samples: 10388480 | consumed tokens: 21275607040 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.323876E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.063 | TFLOPs: 31.59 | 7: iteration 40590/ 115203 | consumed samples: 10391040 | consumed tokens: 21280849920 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.311656E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.089 | TFLOPs: 30.91 | 7: iteration 40600/ 115203 | consumed samples: 10393600 | consumed tokens: 21286092800 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.331309E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.409 | TFLOPs: 31.40 | 7: iteration 40610/ 115203 | consumed samples: 10396160 | consumed tokens: 21291335680 | elapsed time per iteration (s): 0.43 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.333218E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.225 | TFLOPs: 31.07 | 7: iteration 40620/ 115203 | consumed samples: 10398720 | consumed tokens: 21296578560 | elapsed time per iteration (s): 0.42 | learning rate: 1.519E-04 | global batch size: 256 | lm loss: 2.317044E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.832 | TFLOPs: 31.89 | 7: iteration 40630/ 115203 | consumed samples: 10401280 | consumed tokens: 21301821440 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.340550E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.849 | TFLOPs: 31.32 | 7: iteration 40640/ 115203 | consumed samples: 10403840 | consumed tokens: 21307064320 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.320635E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.110 | TFLOPs: 31.22 | 7: iteration 40650/ 115203 | consumed samples: 10406400 | consumed tokens: 21312307200 | elapsed time per iteration (s): 0.43 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.351935E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.289 | TFLOPs: 31.34 | 7: iteration 40660/ 115203 | consumed samples: 10408960 | consumed tokens: 21317550080 | elapsed time per iteration (s): 0.42 | learning rate: 1.518E-04 | global batch size: 256 | lm loss: 2.323988E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.293 | TFLOPs: 31.71 | 7: iteration 40670/ 115203 | consumed samples: 10411520 | consumed tokens: 21322792960 | elapsed time per iteration (s): 0.42 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.323950E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.883 | TFLOPs: 31.63 | 7: iteration 40680/ 115203 | consumed samples: 10414080 | consumed tokens: 21328035840 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.305492E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.952 | TFLOPs: 31.27 | 7: iteration 40690/ 115203 | consumed samples: 10416640 | consumed tokens: 21333278720 | elapsed time per iteration (s): 0.42 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.313279E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.767 | TFLOPs: 31.63 | 7: iteration 40700/ 115203 | consumed samples: 10419200 | consumed tokens: 21338521600 | elapsed time per iteration (s): 0.42 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.318397E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.212 | TFLOPs: 31.70 | 7: iteration 40710/ 115203 | consumed samples: 10421760 | consumed tokens: 21343764480 | elapsed time per iteration (s): 0.43 | learning rate: 1.517E-04 | global batch size: 256 | lm loss: 2.325440E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.532 | TFLOPs: 31.30 | 7: iteration 40720/ 115203 | consumed samples: 10424320 | consumed tokens: 21349007360 | elapsed time per iteration (s): 0.43 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.311081E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.343 | TFLOPs: 31.60 | 7: iteration 40730/ 115203 | consumed samples: 10426880 | consumed tokens: 21354250240 | elapsed time per iteration (s): 0.43 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.327023E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.636 | TFLOPs: 31.46 | 7: iteration 40740/ 115203 | consumed samples: 10429440 | consumed tokens: 21359493120 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.332589E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.434 | TFLOPs: 31.71 | 7: iteration 40750/ 115203 | consumed samples: 10432000 | consumed tokens: 21364736000 | elapsed time per iteration (s): 0.42 | learning rate: 1.516E-04 | global batch size: 256 | lm loss: 2.311791E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.673 | TFLOPs: 31.83 | 7: iteration 40760/ 115203 | consumed samples: 10434560 | consumed tokens: 21369978880 | elapsed time per iteration (s): 0.42 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.301942E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.886 | TFLOPs: 32.05 | 7: iteration 40770/ 115203 | consumed samples: 10437120 | consumed tokens: 21375221760 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.282721E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.433 | TFLOPs: 31.40 | 7: iteration 40780/ 115203 | consumed samples: 10439680 | consumed tokens: 21380464640 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.320504E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.888 | TFLOPs: 31.58 | 7: iteration 40790/ 115203 | consumed samples: 10442240 | consumed tokens: 21385707520 | elapsed time per iteration (s): 0.44 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.362055E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.942 | TFLOPs: 30.32 | 7: iteration 40800/ 115203 | consumed samples: 10444800 | consumed tokens: 21390950400 | elapsed time per iteration (s): 0.43 | learning rate: 1.515E-04 | global batch size: 256 | lm loss: 2.319764E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.534 | TFLOPs: 31.14 | 7: iteration 40810/ 115203 | consumed samples: 10447360 | consumed tokens: 21396193280 | elapsed time per iteration (s): 0.45 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.319041E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.979 | TFLOPs: 29.96 | 7: iteration 40820/ 115203 | consumed samples: 10449920 | consumed tokens: 21401436160 | elapsed time per iteration (s): 0.42 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.311737E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.440 | TFLOPs: 31.92 | 7: iteration 40830/ 115203 | consumed samples: 10452480 | consumed tokens: 21406679040 | elapsed time per iteration (s): 0.45 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.350713E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.673 | TFLOPs: 29.68 | 7: iteration 40840/ 115203 | consumed samples: 10455040 | consumed tokens: 21411921920 | elapsed time per iteration (s): 0.43 | learning rate: 1.514E-04 | global batch size: 256 | lm loss: 2.310241E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.319 | TFLOPs: 31.45 | 7: iteration 40850/ 115203 | consumed samples: 10457600 | consumed tokens: 21417164800 | elapsed time per iteration (s): 0.43 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.329735E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.709 | TFLOPs: 31.36 | 7: iteration 40860/ 115203 | consumed samples: 10460160 | consumed tokens: 21422407680 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.308823E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.624 | TFLOPs: 31.78 | 7: iteration 40870/ 115203 | consumed samples: 10462720 | consumed tokens: 21427650560 | elapsed time per iteration (s): 0.43 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.323121E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.012 | TFLOPs: 31.38 | 7: iteration 40880/ 115203 | consumed samples: 10465280 | consumed tokens: 21432893440 | elapsed time per iteration (s): 0.43 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.341356E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.817 | TFLOPs: 31.05 | 7: iteration 40890/ 115203 | consumed samples: 10467840 | consumed tokens: 21438136320 | elapsed time per iteration (s): 0.42 | learning rate: 1.513E-04 | global batch size: 256 | lm loss: 2.326670E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.001 | TFLOPs: 31.80 | 7: iteration 40900/ 115203 | consumed samples: 10470400 | consumed tokens: 21443379200 | elapsed time per iteration (s): 0.42 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.313457E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.816 | TFLOPs: 31.68 | 7: iteration 40910/ 115203 | consumed samples: 10472960 | consumed tokens: 21448622080 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.309937E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.246 | TFLOPs: 31.55 | 7: iteration 40920/ 115203 | consumed samples: 10475520 | consumed tokens: 21453864960 | elapsed time per iteration (s): 0.42 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.351234E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.667 | TFLOPs: 31.78 | 7: iteration 40930/ 115203 | consumed samples: 10478080 | consumed tokens: 21459107840 | elapsed time per iteration (s): 0.43 | learning rate: 1.512E-04 | global batch size: 256 | lm loss: 2.318992E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.856 | TFLOPs: 31.16 | 7: iteration 40940/ 115203 | consumed samples: 10480640 | consumed tokens: 21464350720 | elapsed time per iteration (s): 0.42 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.329046E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.380 | TFLOPs: 31.61 | 7: iteration 40950/ 115203 | consumed samples: 10483200 | consumed tokens: 21469593600 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.346578E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.918 | TFLOPs: 31.27 | 7: iteration 40960/ 115203 | consumed samples: 10485760 | consumed tokens: 21474836480 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.350511E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.646 | TFLOPs: 31.20 | 7: iteration 40970/ 115203 | consumed samples: 10488320 | consumed tokens: 21480079360 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.312799E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.129 | TFLOPs: 31.49 | 7: iteration 40980/ 115203 | consumed samples: 10490880 | consumed tokens: 21485322240 | elapsed time per iteration (s): 0.43 | learning rate: 1.511E-04 | global batch size: 256 | lm loss: 2.295563E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.999 | TFLOPs: 31.48 | 7: iteration 40990/ 115203 | consumed samples: 10493440 | consumed tokens: 21490565120 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.308840E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.483 | TFLOPs: 31.61 | 7: iteration 41000/ 115203 | consumed samples: 10496000 | consumed tokens: 21495808000 | elapsed time per iteration (s): 0.43 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.346504E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.822 | TFLOPs: 31.05 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 41000 | lm loss value: 2.203928E+00 | lm loss PPL: 9.060531E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 41000 to checkpoints_221m 0: [2022-11-28 17:52:43,016] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step41000 is begin to save! 0: [2022-11-28 17:52:43,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:52:43,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:52:43,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:52:43,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:52:43,218] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:52:43,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:52:43,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:52:43,279] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:52:43,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:52:43,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:52:43,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:52:43,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:52:43,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:52:43,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:52:43,352] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:52:43,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:52:43,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:52:43,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:52:43,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:52:43,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:52:43,419] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:52:43,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:52:43,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:52:43,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:52:43,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:52:43,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:52:43,486] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:52:43,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:52:43,511] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:52:43,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:52:43,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:52:43,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:52:43,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:52:43,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:52:43,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:52:43,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:52:43,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:52:43,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:52:43,623] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:52:43,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:52:43,628] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step41000/mp_rank_00_model_states.pt 0: [2022-11-28 17:52:43,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:52:43,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:52:43,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:52:43,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,700] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:52:43,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 17:52:43,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 2: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,702] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,702] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 7: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 1: [2022-11-28 17:52:43,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:52:43,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:52:43,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 17:52:43,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 5: [2022-11-28 17:52:43,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:52:43,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:52:43,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 3: [2022-11-28 17:52:43,733] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:52:43,733] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:52:43,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: [2022-11-28 17:52:43,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:52:43,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 6: [2022-11-28 17:52:43,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:52:43,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:52:43,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 4: [2022-11-28 17:52:44,138] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:52:44,138] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step41000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 17:52:44,138] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step41000 is ready now! 0: successfully saved checkpoint at iteration 41000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1127.25 7: iteration 41010/ 115203 | consumed samples: 10498560 | consumed tokens: 21501050880 | elapsed time per iteration (s): 0.55 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.324344E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 464.376 | TFLOPs: 24.37 | 7: iteration 41020/ 115203 | consumed samples: 10501120 | consumed tokens: 21506293760 | elapsed time per iteration (s): 0.42 | learning rate: 1.510E-04 | global batch size: 256 | lm loss: 2.350916E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.061 | TFLOPs: 32.01 | 7: iteration 41030/ 115203 | consumed samples: 10503680 | consumed tokens: 21511536640 | elapsed time per iteration (s): 0.44 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.324207E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.502 | TFLOPs: 30.41 | 7: iteration 41040/ 115203 | consumed samples: 10506240 | consumed tokens: 21516779520 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.343915E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.386 | TFLOPs: 31.97 | 7: iteration 41050/ 115203 | consumed samples: 10508800 | consumed tokens: 21522022400 | elapsed time per iteration (s): 0.43 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.337404E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.281 | TFLOPs: 31.55 | 7: iteration 41060/ 115203 | consumed samples: 10511360 | consumed tokens: 21527265280 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.287769E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.672 | TFLOPs: 31.67 | 7: iteration 41070/ 115203 | consumed samples: 10513920 | consumed tokens: 21532508160 | elapsed time per iteration (s): 0.42 | learning rate: 1.509E-04 | global batch size: 256 | lm loss: 2.351446E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.990 | TFLOPs: 31.74 | 7: iteration 41080/ 115203 | consumed samples: 10516480 | consumed tokens: 21537751040 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.327419E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.319 | TFLOPs: 31.50 | 7: iteration 41090/ 115203 | consumed samples: 10519040 | consumed tokens: 21542993920 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.296995E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.641 | TFLOPs: 31.41 | 7: iteration 41100/ 115203 | consumed samples: 10521600 | consumed tokens: 21548236800 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.303687E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.386 | TFLOPs: 31.45 | 7: iteration 41110/ 115203 | consumed samples: 10524160 | consumed tokens: 21553479680 | elapsed time per iteration (s): 0.43 | learning rate: 1.508E-04 | global batch size: 256 | lm loss: 2.314411E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.749 | TFLOPs: 31.15 | 7: iteration 41120/ 115203 | consumed samples: 10526720 | consumed tokens: 21558722560 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.330106E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.008 | TFLOPs: 31.74 | 7: iteration 41130/ 115203 | consumed samples: 10529280 | consumed tokens: 21563965440 | elapsed time per iteration (s): 0.43 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.351799E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.213 | TFLOPs: 31.23 | 7: iteration 41140/ 115203 | consumed samples: 10531840 | consumed tokens: 21569208320 | elapsed time per iteration (s): 0.43 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.319308E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.564 | TFLOPs: 31.25 | 7: iteration 41150/ 115203 | consumed samples: 10534400 | consumed tokens: 21574451200 | elapsed time per iteration (s): 0.42 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.344780E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.542 | TFLOPs: 31.88 | 7: iteration 41160/ 115203 | consumed samples: 10536960 | consumed tokens: 21579694080 | elapsed time per iteration (s): 0.43 | learning rate: 1.507E-04 | global batch size: 256 | lm loss: 2.340037E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.344 | TFLOPs: 30.92 | 7: iteration 41170/ 115203 | consumed samples: 10539520 | consumed tokens: 21584936960 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.305854E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.191 | TFLOPs: 31.39 | 7: iteration 41180/ 115203 | consumed samples: 10542080 | consumed tokens: 21590179840 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.274730E+00 | grad norm: 0.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.941 | TFLOPs: 31.11 | 7: iteration 41190/ 115203 | consumed samples: 10544640 | consumed tokens: 21595422720 | elapsed time per iteration (s): 0.42 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.332303E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.325 | TFLOPs: 31.87 | 7: iteration 41200/ 115203 | consumed samples: 10547200 | consumed tokens: 21600665600 | elapsed time per iteration (s): 0.43 | learning rate: 1.506E-04 | global batch size: 256 | lm loss: 2.289672E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.519 | TFLOPs: 31.56 | 7: iteration 41210/ 115203 | consumed samples: 10549760 | consumed tokens: 21605908480 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.329064E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.898 | TFLOPs: 31.37 | 7: iteration 41220/ 115203 | consumed samples: 10552320 | consumed tokens: 21611151360 | elapsed time per iteration (s): 0.42 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.330775E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.448 | TFLOPs: 31.71 | 7: iteration 41230/ 115203 | consumed samples: 10554880 | consumed tokens: 21616394240 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.331796E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.470 | TFLOPs: 31.35 | 7: iteration 41240/ 115203 | consumed samples: 10557440 | consumed tokens: 21621637120 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.312856E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.137 | TFLOPs: 31.23 | 7: iteration 41250/ 115203 | consumed samples: 10560000 | consumed tokens: 21626880000 | elapsed time per iteration (s): 0.43 | learning rate: 1.505E-04 | global batch size: 256 | lm loss: 2.314128E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.097 | TFLOPs: 31.43 | 7: iteration 41260/ 115203 | consumed samples: 10562560 | consumed tokens: 21632122880 | elapsed time per iteration (s): 0.42 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.320158E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.805 | TFLOPs: 31.63 | 7: iteration 41270/ 115203 | consumed samples: 10565120 | consumed tokens: 21637365760 | elapsed time per iteration (s): 0.44 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.316552E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.224 | TFLOPs: 30.23 | 7: iteration 41280/ 115203 | consumed samples: 10567680 | consumed tokens: 21642608640 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.342192E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.018 | TFLOPs: 31.53 | 7: iteration 41290/ 115203 | consumed samples: 10570240 | consumed tokens: 21647851520 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.312121E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.516 | TFLOPs: 31.09 | 7: iteration 41300/ 115203 | consumed samples: 10572800 | consumed tokens: 21653094400 | elapsed time per iteration (s): 0.43 | learning rate: 1.504E-04 | global batch size: 256 | lm loss: 2.324663E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.874 | TFLOPs: 31.21 | 7: iteration 41310/ 115203 | consumed samples: 10575360 | consumed tokens: 21658337280 | elapsed time per iteration (s): 0.43 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.325445E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.168 | TFLOPs: 31.59 | 7: iteration 41320/ 115203 | consumed samples: 10577920 | consumed tokens: 21663580160 | elapsed time per iteration (s): 0.43 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.350493E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.340 | TFLOPs: 31.18 | 7: iteration 41330/ 115203 | consumed samples: 10580480 | consumed tokens: 21668823040 | elapsed time per iteration (s): 0.42 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.324616E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.401 | TFLOPs: 31.71 | 7: iteration 41340/ 115203 | consumed samples: 10583040 | consumed tokens: 21674065920 | elapsed time per iteration (s): 0.43 | learning rate: 1.503E-04 | global batch size: 256 | lm loss: 2.335505E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.060 | TFLOPs: 30.91 | 7: iteration 41350/ 115203 | consumed samples: 10585600 | consumed tokens: 21679308800 | elapsed time per iteration (s): 0.43 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.359808E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.449 | TFLOPs: 31.19 | 7: iteration 41360/ 115203 | consumed samples: 10588160 | consumed tokens: 21684551680 | elapsed time per iteration (s): 0.45 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.293713E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.679 | TFLOPs: 29.79 | 7: iteration 41370/ 115203 | consumed samples: 10590720 | consumed tokens: 21689794560 | elapsed time per iteration (s): 0.43 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.305987E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.012 | TFLOPs: 31.32 | 7: iteration 41380/ 115203 | consumed samples: 10593280 | consumed tokens: 21695037440 | elapsed time per iteration (s): 0.44 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.322219E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.072 | TFLOPs: 30.38 | 7: iteration 41390/ 115203 | consumed samples: 10595840 | consumed tokens: 21700280320 | elapsed time per iteration (s): 0.43 | learning rate: 1.502E-04 | global batch size: 256 | lm loss: 2.311192E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.595 | TFLOPs: 31.14 | 7: iteration 41400/ 115203 | consumed samples: 10598400 | consumed tokens: 21705523200 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.313236E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.853 | TFLOPs: 31.53 | 7: iteration 41410/ 115203 | consumed samples: 10600960 | consumed tokens: 21710766080 | elapsed time per iteration (s): 0.43 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.304378E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.166 | TFLOPs: 31.12 | 7: iteration 41420/ 115203 | consumed samples: 10603520 | consumed tokens: 21716008960 | elapsed time per iteration (s): 0.42 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.309079E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.361 | TFLOPs: 31.81 | 7: iteration 41430/ 115203 | consumed samples: 10606080 | consumed tokens: 21721251840 | elapsed time per iteration (s): 0.42 | learning rate: 1.501E-04 | global batch size: 256 | lm loss: 2.315220E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.745 | TFLOPs: 31.99 | 7: iteration 41440/ 115203 | consumed samples: 10608640 | consumed tokens: 21726494720 | elapsed time per iteration (s): 0.44 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.306499E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.139 | TFLOPs: 30.60 | 7: iteration 41450/ 115203 | consumed samples: 10611200 | consumed tokens: 21731737600 | elapsed time per iteration (s): 0.43 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.328647E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.356 | TFLOPs: 31.50 | 7: iteration 41460/ 115203 | consumed samples: 10613760 | consumed tokens: 21736980480 | elapsed time per iteration (s): 0.44 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.291882E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.068 | TFLOPs: 30.80 | 7: iteration 41470/ 115203 | consumed samples: 10616320 | consumed tokens: 21742223360 | elapsed time per iteration (s): 0.43 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.312832E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.891 | TFLOPs: 31.00 | 7: iteration 41480/ 115203 | consumed samples: 10618880 | consumed tokens: 21747466240 | elapsed time per iteration (s): 0.43 | learning rate: 1.500E-04 | global batch size: 256 | lm loss: 2.318331E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.408 | TFLOPs: 31.19 | 7: iteration 41490/ 115203 | consumed samples: 10621440 | consumed tokens: 21752709120 | elapsed time per iteration (s): 0.42 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.345802E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.684 | TFLOPs: 31.67 | 7: iteration 41500/ 115203 | consumed samples: 10624000 | consumed tokens: 21757952000 | elapsed time per iteration (s): 0.42 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.341444E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.874 | TFLOPs: 31.74 | 7: iteration 41510/ 115203 | consumed samples: 10626560 | consumed tokens: 21763194880 | elapsed time per iteration (s): 0.44 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.350660E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.264 | TFLOPs: 30.24 | 7: iteration 41520/ 115203 | consumed samples: 10629120 | consumed tokens: 21768437760 | elapsed time per iteration (s): 0.44 | learning rate: 1.499E-04 | global batch size: 256 | lm loss: 2.343101E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.752 | TFLOPs: 30.63 | 7: iteration 41530/ 115203 | consumed samples: 10631680 | consumed tokens: 21773680640 | elapsed time per iteration (s): 0.43 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.322634E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.506 | TFLOPs: 31.09 | 7: iteration 41540/ 115203 | consumed samples: 10634240 | consumed tokens: 21778923520 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.377250E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.460 | TFLOPs: 31.98 | 7: iteration 41550/ 115203 | consumed samples: 10636800 | consumed tokens: 21784166400 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.330958E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.319 | TFLOPs: 31.76 | 7: iteration 41560/ 115203 | consumed samples: 10639360 | consumed tokens: 21789409280 | elapsed time per iteration (s): 0.42 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.319095E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.606 | TFLOPs: 31.78 | 7: iteration 41570/ 115203 | consumed samples: 10641920 | consumed tokens: 21794652160 | elapsed time per iteration (s): 0.43 | learning rate: 1.498E-04 | global batch size: 256 | lm loss: 2.301735E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.918 | TFLOPs: 31.48 | 7: iteration 41580/ 115203 | consumed samples: 10644480 | consumed tokens: 21799895040 | elapsed time per iteration (s): 0.43 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.304613E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.844 | TFLOPs: 31.53 | 7: iteration 41590/ 115203 | consumed samples: 10647040 | consumed tokens: 21805137920 | elapsed time per iteration (s): 0.43 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.352324E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.569 | TFLOPs: 31.30 | 7: iteration 41600/ 115203 | consumed samples: 10649600 | consumed tokens: 21810380800 | elapsed time per iteration (s): 0.43 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.305332E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.969 | TFLOPs: 30.90 | 7: iteration 41610/ 115203 | consumed samples: 10652160 | consumed tokens: 21815623680 | elapsed time per iteration (s): 0.42 | learning rate: 1.497E-04 | global batch size: 256 | lm loss: 2.354770E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.635 | TFLOPs: 31.83 | 7: iteration 41620/ 115203 | consumed samples: 10654720 | consumed tokens: 21820866560 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.285676E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.695 | TFLOPs: 31.73 | 7: iteration 41630/ 115203 | consumed samples: 10657280 | consumed tokens: 21826109440 | elapsed time per iteration (s): 0.43 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.304320E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.559 | TFLOPs: 31.04 | 7: iteration 41640/ 115203 | consumed samples: 10659840 | consumed tokens: 21831352320 | elapsed time per iteration (s): 0.44 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.343116E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.828 | TFLOPs: 30.27 | 7: iteration 41650/ 115203 | consumed samples: 10662400 | consumed tokens: 21836595200 | elapsed time per iteration (s): 0.42 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.333208E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.241 | TFLOPs: 32.07 | 7: iteration 41660/ 115203 | consumed samples: 10664960 | consumed tokens: 21841838080 | elapsed time per iteration (s): 0.43 | learning rate: 1.496E-04 | global batch size: 256 | lm loss: 2.350529E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.734 | TFLOPs: 31.20 | 7: iteration 41670/ 115203 | consumed samples: 10667520 | consumed tokens: 21847080960 | elapsed time per iteration (s): 0.44 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.368043E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.670 | TFLOPs: 30.73 | 7: iteration 41680/ 115203 | consumed samples: 10670080 | consumed tokens: 21852323840 | elapsed time per iteration (s): 0.43 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.306461E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.296 | TFLOPs: 31.18 | 7: iteration 41690/ 115203 | consumed samples: 10672640 | consumed tokens: 21857566720 | elapsed time per iteration (s): 0.43 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.344155E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.015 | TFLOPs: 31.59 | 7: iteration 41700/ 115203 | consumed samples: 10675200 | consumed tokens: 21862809600 | elapsed time per iteration (s): 0.42 | learning rate: 1.495E-04 | global batch size: 256 | lm loss: 2.332896E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.100 | TFLOPs: 31.96 | 7: iteration 41710/ 115203 | consumed samples: 10677760 | consumed tokens: 21868052480 | elapsed time per iteration (s): 0.44 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.322951E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.155 | TFLOPs: 30.86 | 7: iteration 41720/ 115203 | consumed samples: 10680320 | consumed tokens: 21873295360 | elapsed time per iteration (s): 0.43 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.323141E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.204 | TFLOPs: 31.44 | 7: iteration 41730/ 115203 | consumed samples: 10682880 | consumed tokens: 21878538240 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.358857E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.554 | TFLOPs: 31.62 | 7: iteration 41740/ 115203 | consumed samples: 10685440 | consumed tokens: 21883781120 | elapsed time per iteration (s): 0.43 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.312212E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.036 | TFLOPs: 31.33 | 7: iteration 41750/ 115203 | consumed samples: 10688000 | consumed tokens: 21889024000 | elapsed time per iteration (s): 0.42 | learning rate: 1.494E-04 | global batch size: 256 | lm loss: 2.323147E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.463 | TFLOPs: 31.61 | 7: iteration 41760/ 115203 | consumed samples: 10690560 | consumed tokens: 21894266880 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.347540E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.521 | TFLOPs: 31.35 | 7: iteration 41770/ 115203 | consumed samples: 10693120 | consumed tokens: 21899509760 | elapsed time per iteration (s): 0.42 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.336671E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.500 | TFLOPs: 31.82 | 7: iteration 41780/ 115203 | consumed samples: 10695680 | consumed tokens: 21904752640 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.338698E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.086 | TFLOPs: 31.28 | 7: iteration 41790/ 115203 | consumed samples: 10698240 | consumed tokens: 21909995520 | elapsed time per iteration (s): 0.43 | learning rate: 1.493E-04 | global batch size: 256 | lm loss: 2.306824E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.688 | TFLOPs: 31.52 | 7: iteration 41800/ 115203 | consumed samples: 10700800 | consumed tokens: 21915238400 | elapsed time per iteration (s): 0.43 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.305852E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.309 | TFLOPs: 31.39 | 7: iteration 41810/ 115203 | consumed samples: 10703360 | consumed tokens: 21920481280 | elapsed time per iteration (s): 0.42 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.341718E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.373 | TFLOPs: 31.71 | 7: iteration 41820/ 115203 | consumed samples: 10705920 | consumed tokens: 21925724160 | elapsed time per iteration (s): 0.43 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.305812E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.830 | TFLOPs: 31.37 | 7: iteration 41830/ 115203 | consumed samples: 10708480 | consumed tokens: 21930967040 | elapsed time per iteration (s): 0.43 | learning rate: 1.492E-04 | global batch size: 256 | lm loss: 2.313660E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.356 | TFLOPs: 31.29 | 7: iteration 41840/ 115203 | consumed samples: 10711040 | consumed tokens: 21936209920 | elapsed time per iteration (s): 0.45 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.314744E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.050 | TFLOPs: 30.12 | 7: iteration 41850/ 115203 | consumed samples: 10713600 | consumed tokens: 21941452800 | elapsed time per iteration (s): 0.43 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.310497E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.886 | TFLOPs: 30.95 | 7: iteration 41860/ 115203 | consumed samples: 10716160 | consumed tokens: 21946695680 | elapsed time per iteration (s): 0.45 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.336703E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.850 | TFLOPs: 29.64 | 7: iteration 41870/ 115203 | consumed samples: 10718720 | consumed tokens: 21951938560 | elapsed time per iteration (s): 0.43 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.325490E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.317 | TFLOPs: 31.45 | 7: iteration 41880/ 115203 | consumed samples: 10721280 | consumed tokens: 21957181440 | elapsed time per iteration (s): 0.42 | learning rate: 1.491E-04 | global batch size: 256 | lm loss: 2.322599E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.007 | TFLOPs: 32.01 | 7: iteration 41890/ 115203 | consumed samples: 10723840 | consumed tokens: 21962424320 | elapsed time per iteration (s): 0.43 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.362062E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.769 | TFLOPs: 31.47 | 7: iteration 41900/ 115203 | consumed samples: 10726400 | consumed tokens: 21967667200 | elapsed time per iteration (s): 0.43 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.323308E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.596 | TFLOPs: 31.25 | 7: iteration 41910/ 115203 | consumed samples: 10728960 | consumed tokens: 21972910080 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.328969E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.081 | TFLOPs: 32.01 | 7: iteration 41920/ 115203 | consumed samples: 10731520 | consumed tokens: 21978152960 | elapsed time per iteration (s): 0.42 | learning rate: 1.490E-04 | global batch size: 256 | lm loss: 2.328302E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.225 | TFLOPs: 32.17 | 7: iteration 41930/ 115203 | consumed samples: 10734080 | consumed tokens: 21983395840 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.314338E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.423 | TFLOPs: 31.35 | 7: iteration 41940/ 115203 | consumed samples: 10736640 | consumed tokens: 21988638720 | elapsed time per iteration (s): 0.44 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.313032E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.883 | TFLOPs: 30.79 | 7: iteration 41950/ 115203 | consumed samples: 10739200 | consumed tokens: 21993881600 | elapsed time per iteration (s): 0.42 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.331175E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.589 | TFLOPs: 31.88 | 7: iteration 41960/ 115203 | consumed samples: 10741760 | consumed tokens: 21999124480 | elapsed time per iteration (s): 0.42 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.355577E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.156 | TFLOPs: 32.01 | 7: iteration 41970/ 115203 | consumed samples: 10744320 | consumed tokens: 22004367360 | elapsed time per iteration (s): 0.43 | learning rate: 1.489E-04 | global batch size: 256 | lm loss: 2.302489E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | 7: iteration 41980/ 115203 | consumed samples: 10746880 | consumed tokens: 22009610240 | elapsed time per iteration (s): 0.44 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.331828E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.644 | TFLOPs: 30.57 | 7: iteration 41990/ 115203 | consumed samples: 10749440 | consumed tokens: 22014853120 | elapsed time per iteration (s): 0.43 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.322785E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.727 | TFLOPs: 31.05 | 0: [2022-11-28 17:59:52,963] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=0, lr=[0.0001487921045166041, 0.0001487921045166041, 0.0001487921045166041], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 42000/ 115203 | consumed samples: 10752000 | consumed tokens: 22020096000 | elapsed time per iteration (s): 0.42 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.311565E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.763 | TFLOPs: 31.63 | 0: steps: 42000 loss: 2.3227 iter time (s): 0.428 samples/sec: 598.749 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 42000 | lm loss value: 2.266246E+00 | lm loss PPL: 9.643136E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 42000 to checkpoints_221m 0: [2022-11-28 17:59:53,121] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step42000 is begin to save! 0: [2022-11-28 17:59:53,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_01-model_00-model_states.pt... 0: [2022-11-28 17:59:53,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_01-model_00-model_states.pt. 0: [2022-11-28 17:59:53,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_03-model_00-model_states.pt... 0: [2022-11-28 17:59:53,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_03-model_00-model_states.pt. 0: [2022-11-28 17:59:53,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_04-model_00-model_states.pt... 0: [2022-11-28 17:59:53,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_04-model_00-model_states.pt. 0: [2022-11-28 17:59:53,272] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_05-model_00-model_states.pt... 0: [2022-11-28 17:59:53,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_05-model_00-model_states.pt. 0: [2022-11-28 17:59:53,295] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_06-model_00-model_states.pt... 0: [2022-11-28 17:59:53,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_06-model_00-model_states.pt. 0: [2022-11-28 17:59:53,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_07-model_00-model_states.pt... 0: [2022-11-28 17:59:53,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_07-model_00-model_states.pt. 0: [2022-11-28 17:59:53,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_08-model_00-model_states.pt... 0: [2022-11-28 17:59:53,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_08-model_00-model_states.pt. 0: [2022-11-28 17:59:53,363] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_09-model_00-model_states.pt... 0: [2022-11-28 17:59:53,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_09-model_00-model_states.pt. 0: [2022-11-28 17:59:53,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_10-model_00-model_states.pt... 0: [2022-11-28 17:59:53,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_10-model_00-model_states.pt. 0: [2022-11-28 17:59:53,410] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_11-model_00-model_states.pt... 0: [2022-11-28 17:59:53,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_11-model_00-model_states.pt. 0: [2022-11-28 17:59:53,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_12-model_00-model_states.pt... 0: [2022-11-28 17:59:53,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_12-model_00-model_states.pt. 0: [2022-11-28 17:59:53,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_13-model_00-model_states.pt... 0: [2022-11-28 17:59:53,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_13-model_00-model_states.pt. 0: [2022-11-28 17:59:53,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_14-model_00-model_states.pt... 0: [2022-11-28 17:59:53,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_14-model_00-model_states.pt. 0: [2022-11-28 17:59:53,500] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_15-model_00-model_states.pt... 0: [2022-11-28 17:59:53,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_15-model_00-model_states.pt. 0: [2022-11-28 17:59:53,523] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_16-model_00-model_states.pt... 0: [2022-11-28 17:59:53,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_16-model_00-model_states.pt. 0: [2022-11-28 17:59:53,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_17-model_00-model_states.pt... 0: [2022-11-28 17:59:53,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_17-model_00-model_states.pt. 0: [2022-11-28 17:59:53,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_18-model_00-model_states.pt... 0: [2022-11-28 17:59:53,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_18-model_00-model_states.pt. 0: [2022-11-28 17:59:53,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_19-model_00-model_states.pt... 0: [2022-11-28 17:59:53,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_19-model_00-model_states.pt. 0: [2022-11-28 17:59:53,616] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_20-model_00-model_states.pt... 0: [2022-11-28 17:59:53,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_20-model_00-model_states.pt. 0: [2022-11-28 17:59:53,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/layer_22-model_00-model_states.pt... 0: [2022-11-28 17:59:53,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/layer_22-model_00-model_states.pt. 0: [2022-11-28 17:59:53,644] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step42000/mp_rank_00_model_states.pt 0: [2022-11-28 17:59:53,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/mp_rank_00_model_states.pt... 0: [2022-11-28 17:59:53,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/mp_rank_00_model_states.pt. 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 6: [2022-11-28 17:59:53,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 2: [2022-11-28 17:59:53,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,713] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,713] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,714] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,714] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,715] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 6: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 17:59:53,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 2: [2022-11-28 17:59:53,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2022-11-28 17:59:53,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 4: [2022-11-28 17:59:53,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,721] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,721] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 5: [2022-11-28 17:59:53,719] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,719] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 1: [2022-11-28 17:59:53,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 17:59:53,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 17:59:53,724] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 17:59:53,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 17:59:53,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 7: [2022-11-28 17:59:53,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 17:59:53,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 17:59:53,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 17:59:53,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 17:59:53,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,728] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 3: [2022-11-28 17:59:53,728] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: [2022-11-28 17:59:53,776] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step42000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 17:59:53,776] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step42000 is ready now! 0: successfully saved checkpoint at iteration 42000 to checkpoints_221m 7: time (ms) | save-checkpoint: 661.51 7: iteration 42010/ 115203 | consumed samples: 10754560 | consumed tokens: 22025338880 | elapsed time per iteration (s): 0.50 | learning rate: 1.488E-04 | global batch size: 256 | lm loss: 2.336147E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 508.019 | TFLOPs: 26.65 | 7: iteration 42020/ 115203 | consumed samples: 10757120 | consumed tokens: 22030581760 | elapsed time per iteration (s): 0.44 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.328686E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.482 | TFLOPs: 30.82 | 7: iteration 42030/ 115203 | consumed samples: 10759680 | consumed tokens: 22035824640 | elapsed time per iteration (s): 0.42 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.331348E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.665 | TFLOPs: 31.62 | 7: iteration 42040/ 115203 | consumed samples: 10762240 | consumed tokens: 22041067520 | elapsed time per iteration (s): 0.44 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.333936E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.295 | TFLOPs: 30.45 | 7: iteration 42050/ 115203 | consumed samples: 10764800 | consumed tokens: 22046310400 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.343466E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.570 | TFLOPs: 31.56 | 7: iteration 42060/ 115203 | consumed samples: 10767360 | consumed tokens: 22051553280 | elapsed time per iteration (s): 0.43 | learning rate: 1.487E-04 | global batch size: 256 | lm loss: 2.325826E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.619 | TFLOPs: 31.41 | 7: iteration 42070/ 115203 | consumed samples: 10769920 | consumed tokens: 22056796160 | elapsed time per iteration (s): 0.42 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.324747E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.207 | TFLOPs: 31.70 | 7: iteration 42080/ 115203 | consumed samples: 10772480 | consumed tokens: 22062039040 | elapsed time per iteration (s): 0.44 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.330351E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.552 | TFLOPs: 30.83 | 7: iteration 42090/ 115203 | consumed samples: 10775040 | consumed tokens: 22067281920 | elapsed time per iteration (s): 0.43 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.337721E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.211 | TFLOPs: 31.44 | 7: iteration 42100/ 115203 | consumed samples: 10777600 | consumed tokens: 22072524800 | elapsed time per iteration (s): 0.42 | learning rate: 1.486E-04 | global batch size: 256 | lm loss: 2.330384E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.281 | TFLOPs: 31.71 | 7: iteration 42110/ 115203 | consumed samples: 10780160 | consumed tokens: 22077767680 | elapsed time per iteration (s): 0.43 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.330306E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.009 | TFLOPs: 31.59 | 7: iteration 42120/ 115203 | consumed samples: 10782720 | consumed tokens: 22083010560 | elapsed time per iteration (s): 0.56 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.327567E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.748 | TFLOPs: 23.81 | 7: iteration 42130/ 115203 | consumed samples: 10785280 | consumed tokens: 22088253440 | elapsed time per iteration (s): 0.43 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.342405E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.461 | TFLOPs: 31.45 | 7: iteration 42140/ 115203 | consumed samples: 10787840 | consumed tokens: 22093496320 | elapsed time per iteration (s): 0.43 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.317288E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.578 | TFLOPs: 31.41 | 7: iteration 42150/ 115203 | consumed samples: 10790400 | consumed tokens: 22098739200 | elapsed time per iteration (s): 0.42 | learning rate: 1.485E-04 | global batch size: 256 | lm loss: 2.328418E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.101 | TFLOPs: 31.91 | 7: iteration 42160/ 115203 | consumed samples: 10792960 | consumed tokens: 22103982080 | elapsed time per iteration (s): 0.43 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.340488E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.337 | TFLOPs: 31.50 | 7: iteration 42170/ 115203 | consumed samples: 10795520 | consumed tokens: 22109224960 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.327460E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.950 | TFLOPs: 31.74 | 7: iteration 42180/ 115203 | consumed samples: 10798080 | consumed tokens: 22114467840 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.307861E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.287 | TFLOPs: 31.86 | 7: iteration 42190/ 115203 | consumed samples: 10800640 | consumed tokens: 22119710720 | elapsed time per iteration (s): 0.42 | learning rate: 1.484E-04 | global batch size: 256 | lm loss: 2.327724E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.908 | TFLOPs: 31.69 | 7: iteration 42200/ 115203 | consumed samples: 10803200 | consumed tokens: 22124953600 | elapsed time per iteration (s): 0.42 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.352387E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.157 | TFLOPs: 31.91 | 7: iteration 42210/ 115203 | consumed samples: 10805760 | consumed tokens: 22130196480 | elapsed time per iteration (s): 0.43 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.295166E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.906 | TFLOPs: 31.42 | 7: iteration 42220/ 115203 | consumed samples: 10808320 | consumed tokens: 22135439360 | elapsed time per iteration (s): 0.43 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.311567E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.265 | TFLOPs: 30.97 | 7: iteration 42230/ 115203 | consumed samples: 10810880 | consumed tokens: 22140682240 | elapsed time per iteration (s): 0.56 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.291517E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.888 | TFLOPs: 23.81 | 7: iteration 42240/ 115203 | consumed samples: 10813440 | consumed tokens: 22145925120 | elapsed time per iteration (s): 0.47 | learning rate: 1.483E-04 | global batch size: 256 | lm loss: 2.320827E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.297 | TFLOPs: 28.82 | 7: iteration 42250/ 115203 | consumed samples: 10816000 | consumed tokens: 22151168000 | elapsed time per iteration (s): 0.43 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.352544E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.464 | TFLOPs: 31.40 | 7: iteration 42260/ 115203 | consumed samples: 10818560 | consumed tokens: 22156410880 | elapsed time per iteration (s): 0.43 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.314856E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.003 | TFLOPs: 31.48 | 7: iteration 42270/ 115203 | consumed samples: 10821120 | consumed tokens: 22161653760 | elapsed time per iteration (s): 0.43 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.310562E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.709 | TFLOPs: 31.26 | 7: iteration 42280/ 115203 | consumed samples: 10823680 | consumed tokens: 22166896640 | elapsed time per iteration (s): 0.43 | learning rate: 1.482E-04 | global batch size: 256 | lm loss: 2.331804E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.446 | TFLOPs: 31.35 | 7: iteration 42290/ 115203 | consumed samples: 10826240 | consumed tokens: 22172139520 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.315138E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.756 | TFLOPs: 31.57 | 7: iteration 42300/ 115203 | consumed samples: 10828800 | consumed tokens: 22177382400 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.314027E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.626 | TFLOPs: 31.51 | 7: iteration 42310/ 115203 | consumed samples: 10831360 | consumed tokens: 22182625280 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.313216E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.069 | TFLOPs: 31.38 | 7: iteration 42320/ 115203 | consumed samples: 10833920 | consumed tokens: 22187868160 | elapsed time per iteration (s): 0.43 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.283491E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.901 | TFLOPs: 31.06 | 7: iteration 42330/ 115203 | consumed samples: 10836480 | consumed tokens: 22193111040 | elapsed time per iteration (s): 0.44 | learning rate: 1.481E-04 | global batch size: 256 | lm loss: 2.336324E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.672 | TFLOPs: 30.68 | 7: iteration 42340/ 115203 | consumed samples: 10839040 | consumed tokens: 22198353920 | elapsed time per iteration (s): 0.42 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.316764E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.713 | TFLOPs: 31.62 | 7: iteration 42350/ 115203 | consumed samples: 10841600 | consumed tokens: 22203596800 | elapsed time per iteration (s): 0.43 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.306655E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.391 | TFLOPs: 31.50 | 7: iteration 42360/ 115203 | consumed samples: 10844160 | consumed tokens: 22208839680 | elapsed time per iteration (s): 0.43 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.338497E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.915 | TFLOPs: 31.16 | 7: iteration 42370/ 115203 | consumed samples: 10846720 | consumed tokens: 22214082560 | elapsed time per iteration (s): 0.43 | learning rate: 1.480E-04 | global batch size: 256 | lm loss: 2.336182E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.919 | TFLOPs: 31.21 | 7: iteration 42380/ 115203 | consumed samples: 10849280 | consumed tokens: 22219325440 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.303923E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.307 | TFLOPs: 30.97 | 7: iteration 42390/ 115203 | consumed samples: 10851840 | consumed tokens: 22224568320 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.326895E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.275 | TFLOPs: 31.29 | 7: iteration 42400/ 115203 | consumed samples: 10854400 | consumed tokens: 22229811200 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.365486E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.535 | TFLOPs: 31.19 | 7: iteration 42410/ 115203 | consumed samples: 10856960 | consumed tokens: 22235054080 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.282246E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.344 | TFLOPs: 31.24 | 7: iteration 42420/ 115203 | consumed samples: 10859520 | consumed tokens: 22240296960 | elapsed time per iteration (s): 0.43 | learning rate: 1.479E-04 | global batch size: 256 | lm loss: 2.297091E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.025 | TFLOPs: 31.38 | 7: iteration 42430/ 115203 | consumed samples: 10862080 | consumed tokens: 22245539840 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.295525E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.122 | TFLOPs: 31.54 | 7: iteration 42440/ 115203 | consumed samples: 10864640 | consumed tokens: 22250782720 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.333971E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.665 | TFLOPs: 31.36 | 7: iteration 42450/ 115203 | consumed samples: 10867200 | consumed tokens: 22256025600 | elapsed time per iteration (s): 0.42 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.319683E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.687 | TFLOPs: 31.62 | 7: iteration 42460/ 115203 | consumed samples: 10869760 | consumed tokens: 22261268480 | elapsed time per iteration (s): 0.43 | learning rate: 1.478E-04 | global batch size: 256 | lm loss: 2.338735E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.476 | TFLOPs: 31.56 | 7: iteration 42470/ 115203 | consumed samples: 10872320 | consumed tokens: 22266511360 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.303905E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.548 | TFLOPs: 31.40 | 7: iteration 42480/ 115203 | consumed samples: 10874880 | consumed tokens: 22271754240 | elapsed time per iteration (s): 0.42 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.309022E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.157 | TFLOPs: 31.65 | 7: iteration 42490/ 115203 | consumed samples: 10877440 | consumed tokens: 22276997120 | elapsed time per iteration (s): 0.43 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.360162E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.372 | TFLOPs: 31.34 | 7: iteration 42500/ 115203 | consumed samples: 10880000 | consumed tokens: 22282240000 | elapsed time per iteration (s): 0.42 | learning rate: 1.477E-04 | global batch size: 256 | lm loss: 2.327620E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.334 | TFLOPs: 31.76 | 7: iteration 42510/ 115203 | consumed samples: 10882560 | consumed tokens: 22287482880 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.350188E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.843 | TFLOPs: 30.95 | 7: iteration 42520/ 115203 | consumed samples: 10885120 | consumed tokens: 22292725760 | elapsed time per iteration (s): 0.42 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.287927E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.410 | TFLOPs: 31.82 | 7: iteration 42530/ 115203 | consumed samples: 10887680 | consumed tokens: 22297968640 | elapsed time per iteration (s): 0.43 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.318590E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.705 | TFLOPs: 31.52 | 7: iteration 42540/ 115203 | consumed samples: 10890240 | consumed tokens: 22303211520 | elapsed time per iteration (s): 0.42 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.299402E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.651 | TFLOPs: 32.04 | 7: iteration 42550/ 115203 | consumed samples: 10892800 | consumed tokens: 22308454400 | elapsed time per iteration (s): 0.44 | learning rate: 1.476E-04 | global batch size: 256 | lm loss: 2.330217E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.708 | TFLOPs: 30.21 | 7: iteration 42560/ 115203 | consumed samples: 10895360 | consumed tokens: 22313697280 | elapsed time per iteration (s): 0.42 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.314329E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.640 | TFLOPs: 31.83 | 7: iteration 42570/ 115203 | consumed samples: 10897920 | consumed tokens: 22318940160 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.315758E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.597 | TFLOPs: 31.35 | 7: iteration 42580/ 115203 | consumed samples: 10900480 | consumed tokens: 22324183040 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.323165E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.165 | TFLOPs: 31.44 | 7: iteration 42590/ 115203 | consumed samples: 10903040 | consumed tokens: 22329425920 | elapsed time per iteration (s): 0.43 | learning rate: 1.475E-04 | global batch size: 256 | lm loss: 2.337584E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.053 | TFLOPs: 31.59 | 7: iteration 42600/ 115203 | consumed samples: 10905600 | consumed tokens: 22334668800 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.328529E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.236 | TFLOPs: 31.23 | 7: iteration 42610/ 115203 | consumed samples: 10908160 | consumed tokens: 22339911680 | elapsed time per iteration (s): 0.42 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.305586E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.216 | TFLOPs: 32.17 | 7: iteration 42620/ 115203 | consumed samples: 10910720 | consumed tokens: 22345154560 | elapsed time per iteration (s): 0.42 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.296599E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.689 | TFLOPs: 32.15 | 7: iteration 42630/ 115203 | consumed samples: 10913280 | consumed tokens: 22350397440 | elapsed time per iteration (s): 0.43 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.291210E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.699 | TFLOPs: 31.41 | 7: iteration 42640/ 115203 | consumed samples: 10915840 | consumed tokens: 22355640320 | elapsed time per iteration (s): 0.44 | learning rate: 1.474E-04 | global batch size: 256 | lm loss: 2.321950E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.141 | TFLOPs: 30.65 | 7: iteration 42650/ 115203 | consumed samples: 10918400 | consumed tokens: 22360883200 | elapsed time per iteration (s): 0.42 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.304578E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.478 | TFLOPs: 31.66 | 7: iteration 42660/ 115203 | consumed samples: 10920960 | consumed tokens: 22366126080 | elapsed time per iteration (s): 0.44 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.301323E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.132 | TFLOPs: 30.81 | 7: iteration 42670/ 115203 | consumed samples: 10923520 | consumed tokens: 22371368960 | elapsed time per iteration (s): 0.43 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.319889E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.788 | TFLOPs: 31.47 | 7: iteration 42680/ 115203 | consumed samples: 10926080 | consumed tokens: 22376611840 | elapsed time per iteration (s): 0.42 | learning rate: 1.473E-04 | global batch size: 256 | lm loss: 2.330825E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.236 | TFLOPs: 31.70 | 7: iteration 42690/ 115203 | consumed samples: 10928640 | consumed tokens: 22381854720 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.292154E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.082 | TFLOPs: 31.17 | 7: iteration 42700/ 115203 | consumed samples: 10931200 | consumed tokens: 22387097600 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.337760E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.442 | TFLOPs: 31.45 | 7: iteration 42710/ 115203 | consumed samples: 10933760 | consumed tokens: 22392340480 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.336395E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.344 | TFLOPs: 31.60 | 7: iteration 42720/ 115203 | consumed samples: 10936320 | consumed tokens: 22397583360 | elapsed time per iteration (s): 0.51 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.316511E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 497.735 | TFLOPs: 26.12 | 7: iteration 42730/ 115203 | consumed samples: 10938880 | consumed tokens: 22402826240 | elapsed time per iteration (s): 0.43 | learning rate: 1.472E-04 | global batch size: 256 | lm loss: 2.298914E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.394 | TFLOPs: 31.29 | 7: iteration 42740/ 115203 | consumed samples: 10941440 | consumed tokens: 22408069120 | elapsed time per iteration (s): 0.42 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.295983E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.807 | TFLOPs: 31.89 | 7: iteration 42750/ 115203 | consumed samples: 10944000 | consumed tokens: 22413312000 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.287664E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.593 | TFLOPs: 31.46 | 7: iteration 42760/ 115203 | consumed samples: 10946560 | consumed tokens: 22418554880 | elapsed time per iteration (s): 0.42 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.294933E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.079 | TFLOPs: 31.90 | 7: iteration 42770/ 115203 | consumed samples: 10949120 | consumed tokens: 22423797760 | elapsed time per iteration (s): 0.43 | learning rate: 1.471E-04 | global batch size: 256 | lm loss: 2.325241E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.225 | TFLOPs: 30.97 | 7: iteration 42780/ 115203 | consumed samples: 10951680 | consumed tokens: 22429040640 | elapsed time per iteration (s): 0.43 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.311547E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.289 | TFLOPs: 31.50 | 7: iteration 42790/ 115203 | consumed samples: 10954240 | consumed tokens: 22434283520 | elapsed time per iteration (s): 0.42 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.325416E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.107 | TFLOPs: 31.64 | 7: iteration 42800/ 115203 | consumed samples: 10956800 | consumed tokens: 22439526400 | elapsed time per iteration (s): 0.43 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.312574E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.435 | TFLOPs: 31.24 | 7: iteration 42810/ 115203 | consumed samples: 10959360 | consumed tokens: 22444769280 | elapsed time per iteration (s): 0.42 | learning rate: 1.470E-04 | global batch size: 256 | lm loss: 2.288824E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.833 | TFLOPs: 31.73 | 7: iteration 42820/ 115203 | consumed samples: 10961920 | consumed tokens: 22450012160 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.307170E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.733 | TFLOPs: 31.47 | 7: iteration 42830/ 115203 | consumed samples: 10964480 | consumed tokens: 22455255040 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.314120E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.657 | TFLOPs: 31.46 | 7: iteration 42840/ 115203 | consumed samples: 10967040 | consumed tokens: 22460497920 | elapsed time per iteration (s): 0.42 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.344782E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.921 | TFLOPs: 31.90 | 7: iteration 42850/ 115203 | consumed samples: 10969600 | consumed tokens: 22465740800 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.310030E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.350 | TFLOPs: 31.39 | 7: iteration 42860/ 115203 | consumed samples: 10972160 | consumed tokens: 22470983680 | elapsed time per iteration (s): 0.43 | learning rate: 1.469E-04 | global batch size: 256 | lm loss: 2.310949E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.735 | TFLOPs: 31.31 | 7: iteration 42870/ 115203 | consumed samples: 10974720 | consumed tokens: 22476226560 | elapsed time per iteration (s): 0.42 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.302281E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.253 | TFLOPs: 31.97 | 7: iteration 42880/ 115203 | consumed samples: 10977280 | consumed tokens: 22481469440 | elapsed time per iteration (s): 0.43 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.288753E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.987 | TFLOPs: 31.59 | 7: iteration 42890/ 115203 | consumed samples: 10979840 | consumed tokens: 22486712320 | elapsed time per iteration (s): 0.44 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.310439E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.605 | TFLOPs: 30.67 | 7: iteration 42900/ 115203 | consumed samples: 10982400 | consumed tokens: 22491955200 | elapsed time per iteration (s): 0.44 | learning rate: 1.468E-04 | global batch size: 256 | lm loss: 2.347717E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.990 | TFLOPs: 30.27 | 7: iteration 42910/ 115203 | consumed samples: 10984960 | consumed tokens: 22497198080 | elapsed time per iteration (s): 0.44 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.328786E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.768 | TFLOPs: 30.58 | 7: iteration 42920/ 115203 | consumed samples: 10987520 | consumed tokens: 22502440960 | elapsed time per iteration (s): 0.42 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.315437E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.039 | TFLOPs: 31.64 | 7: iteration 42930/ 115203 | consumed samples: 10990080 | consumed tokens: 22507683840 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.339081E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.720 | TFLOPs: 31.47 | 7: iteration 42940/ 115203 | consumed samples: 10992640 | consumed tokens: 22512926720 | elapsed time per iteration (s): 0.42 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.336032E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.711 | TFLOPs: 31.89 | 7: iteration 42950/ 115203 | consumed samples: 10995200 | consumed tokens: 22518169600 | elapsed time per iteration (s): 0.43 | learning rate: 1.467E-04 | global batch size: 256 | lm loss: 2.329936E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.311 | TFLOPs: 31.18 | 7: iteration 42960/ 115203 | consumed samples: 10997760 | consumed tokens: 22523412480 | elapsed time per iteration (s): 0.42 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.302390E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.099 | TFLOPs: 32.06 | 7: iteration 42970/ 115203 | consumed samples: 11000320 | consumed tokens: 22528655360 | elapsed time per iteration (s): 0.44 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.288588E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.609 | TFLOPs: 30.67 | 7: iteration 42980/ 115203 | consumed samples: 11002880 | consumed tokens: 22533898240 | elapsed time per iteration (s): 0.42 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.326763E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.157 | TFLOPs: 31.91 | 7: iteration 42990/ 115203 | consumed samples: 11005440 | consumed tokens: 22539141120 | elapsed time per iteration (s): 0.43 | learning rate: 1.466E-04 | global batch size: 256 | lm loss: 2.318608E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.267 | TFLOPs: 31.13 | 7: iteration 43000/ 115203 | consumed samples: 11008000 | consumed tokens: 22544384000 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.324339E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.740 | TFLOPs: 31.99 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 43000 | lm loss value: 2.290812E+00 | lm loss PPL: 9.882964E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 43000 to checkpoints_221m 0: [2022-11-28 18:07:05,447] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step43000 is begin to save! 0: [2022-11-28 18:07:05,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:07:05,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:07:05,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:07:05,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:07:05,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:07:05,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:07:05,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:07:05,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:07:05,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:07:05,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:07:05,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:07:05,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:07:05,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:07:05,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:07:05,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:07:05,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:07:05,722] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:07:05,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:07:05,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:07:05,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:07:05,769] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:07:05,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:07:05,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:07:05,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:07:05,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:07:05,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:07:05,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:07:05,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:07:05,862] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:07:05,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:07:05,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:07:05,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:07:05,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:07:05,930] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:07:05,931] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:07:05,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:07:05,954] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:07:05,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:07:05,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:07:05,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:07:05,981] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step43000/mp_rank_00_model_states.pt 0: [2022-11-28 18:07:05,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:07:05,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:07:06,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,004] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:07:06,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,064] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,064] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,064] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,065] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,065] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:07:06,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 4: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,060] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,060] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,060] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:07:06,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,062] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,062] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:07:06,071] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:07:06,071] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,074] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,074] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 6: [2022-11-28 18:07:06,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:07:06,078] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:07:06,078] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:07:06,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: [2022-11-28 18:07:06,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:07:06,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,125] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,125] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 3: [2022-11-28 18:07:06,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:07:06,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 18:07:06,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:07:06,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 7: [2022-11-28 18:07:06,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:07:06,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:07:06,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,159] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step43000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:07:06,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 1: [2022-11-28 18:07:06,159] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step43000 is ready now! 0: successfully saved checkpoint at iteration 43000 to checkpoints_221m 7: time (ms) | save-checkpoint: 728.14 7: iteration 43010/ 115203 | consumed samples: 11010560 | consumed tokens: 22549626880 | elapsed time per iteration (s): 0.52 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.338687E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 494.452 | TFLOPs: 25.94 | 7: iteration 43020/ 115203 | consumed samples: 11013120 | consumed tokens: 22554869760 | elapsed time per iteration (s): 0.43 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.341232E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.754 | TFLOPs: 31.52 | 7: iteration 43030/ 115203 | consumed samples: 11015680 | consumed tokens: 22560112640 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.304920E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.452 | TFLOPs: 31.82 | 7: iteration 43040/ 115203 | consumed samples: 11018240 | consumed tokens: 22565355520 | elapsed time per iteration (s): 0.42 | learning rate: 1.465E-04 | global batch size: 256 | lm loss: 2.319305E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.344 | TFLOPs: 31.81 | 7: iteration 43050/ 115203 | consumed samples: 11020800 | consumed tokens: 22570598400 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.320659E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.893 | TFLOPs: 31.63 | 7: iteration 43060/ 115203 | consumed samples: 11023360 | consumed tokens: 22575841280 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.322858E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.193 | TFLOPs: 31.65 | 7: iteration 43070/ 115203 | consumed samples: 11025920 | consumed tokens: 22581084160 | elapsed time per iteration (s): 0.42 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.323920E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.818 | TFLOPs: 31.79 | 7: iteration 43080/ 115203 | consumed samples: 11028480 | consumed tokens: 22586327040 | elapsed time per iteration (s): 0.43 | learning rate: 1.464E-04 | global batch size: 256 | lm loss: 2.324996E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.108 | TFLOPs: 31.12 | 7: iteration 43090/ 115203 | consumed samples: 11031040 | consumed tokens: 22591569920 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.331172E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.713 | TFLOPs: 31.62 | 7: iteration 43100/ 115203 | consumed samples: 11033600 | consumed tokens: 22596812800 | elapsed time per iteration (s): 0.43 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.330459E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.628 | TFLOPs: 31.51 | 7: iteration 43110/ 115203 | consumed samples: 11036160 | consumed tokens: 22602055680 | elapsed time per iteration (s): 0.42 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.337108E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.010 | TFLOPs: 31.95 | 7: iteration 43120/ 115203 | consumed samples: 11038720 | consumed tokens: 22607298560 | elapsed time per iteration (s): 0.44 | learning rate: 1.463E-04 | global batch size: 256 | lm loss: 2.305438E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.034 | TFLOPs: 30.80 | 7: iteration 43130/ 115203 | consumed samples: 11041280 | consumed tokens: 22612541440 | elapsed time per iteration (s): 0.42 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.299036E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.667 | TFLOPs: 31.94 | 7: iteration 43140/ 115203 | consumed samples: 11043840 | consumed tokens: 22617784320 | elapsed time per iteration (s): 0.42 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.305081E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.168 | TFLOPs: 31.80 | 7: iteration 43150/ 115203 | consumed samples: 11046400 | consumed tokens: 22623027200 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.309411E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.366 | TFLOPs: 31.45 | 7: iteration 43160/ 115203 | consumed samples: 11048960 | consumed tokens: 22628270080 | elapsed time per iteration (s): 0.42 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.338978E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.244 | TFLOPs: 31.81 | 7: iteration 43170/ 115203 | consumed samples: 11051520 | consumed tokens: 22633512960 | elapsed time per iteration (s): 0.43 | learning rate: 1.462E-04 | global batch size: 256 | lm loss: 2.290026E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.093 | TFLOPs: 31.54 | 7: iteration 43180/ 115203 | consumed samples: 11054080 | consumed tokens: 22638755840 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.334272E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.373 | TFLOPs: 31.76 | 7: iteration 43190/ 115203 | consumed samples: 11056640 | consumed tokens: 22643998720 | elapsed time per iteration (s): 0.43 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.328595E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.554 | TFLOPs: 31.35 | 7: iteration 43200/ 115203 | consumed samples: 11059200 | consumed tokens: 22649241600 | elapsed time per iteration (s): 0.42 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.296457E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.545 | TFLOPs: 31.67 | 7: iteration 43210/ 115203 | consumed samples: 11061760 | consumed tokens: 22654484480 | elapsed time per iteration (s): 0.43 | learning rate: 1.461E-04 | global batch size: 256 | lm loss: 2.350779E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.884 | TFLOPs: 31.53 | 7: iteration 43220/ 115203 | consumed samples: 11064320 | consumed tokens: 22659727360 | elapsed time per iteration (s): 0.42 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.323070E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.968 | TFLOPs: 31.69 | 7: iteration 43230/ 115203 | consumed samples: 11066880 | consumed tokens: 22664970240 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.330867E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.306 | TFLOPs: 30.97 | 7: iteration 43240/ 115203 | consumed samples: 11069440 | consumed tokens: 22670213120 | elapsed time per iteration (s): 0.44 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.336637E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.010 | TFLOPs: 30.22 | 7: iteration 43250/ 115203 | consumed samples: 11072000 | consumed tokens: 22675456000 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.324500E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.954 | TFLOPs: 31.11 | 7: iteration 43260/ 115203 | consumed samples: 11074560 | consumed tokens: 22680698880 | elapsed time per iteration (s): 0.43 | learning rate: 1.460E-04 | global batch size: 256 | lm loss: 2.314158E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.032 | TFLOPs: 31.38 | 7: iteration 43270/ 115203 | consumed samples: 11077120 | consumed tokens: 22685941760 | elapsed time per iteration (s): 0.44 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.305763E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.561 | TFLOPs: 30.78 | 7: iteration 43280/ 115203 | consumed samples: 11079680 | consumed tokens: 22691184640 | elapsed time per iteration (s): 0.42 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.325919E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.036 | TFLOPs: 31.69 | 7: iteration 43290/ 115203 | consumed samples: 11082240 | consumed tokens: 22696427520 | elapsed time per iteration (s): 0.42 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.315569E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.737 | TFLOPs: 31.78 | 7: iteration 43300/ 115203 | consumed samples: 11084800 | consumed tokens: 22701670400 | elapsed time per iteration (s): 0.42 | learning rate: 1.459E-04 | global batch size: 256 | lm loss: 2.292001E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.606 | TFLOPs: 31.88 | 7: iteration 43310/ 115203 | consumed samples: 11087360 | consumed tokens: 22706913280 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.318567E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.995 | TFLOPs: 31.90 | 7: iteration 43320/ 115203 | consumed samples: 11089920 | consumed tokens: 22712156160 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.308055E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.077 | TFLOPs: 31.85 | 7: iteration 43330/ 115203 | consumed samples: 11092480 | consumed tokens: 22717399040 | elapsed time per iteration (s): 0.43 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.290472E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.901 | TFLOPs: 31.37 | 7: iteration 43340/ 115203 | consumed samples: 11095040 | consumed tokens: 22722641920 | elapsed time per iteration (s): 0.42 | learning rate: 1.458E-04 | global batch size: 256 | lm loss: 2.298747E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.533 | TFLOPs: 31.72 | 7: iteration 43350/ 115203 | consumed samples: 11097600 | consumed tokens: 22727884800 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.304078E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.167 | TFLOPs: 31.49 | 7: iteration 43360/ 115203 | consumed samples: 11100160 | consumed tokens: 22733127680 | elapsed time per iteration (s): 0.42 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.330008E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.616 | TFLOPs: 32.25 | 7: iteration 43370/ 115203 | consumed samples: 11102720 | consumed tokens: 22738370560 | elapsed time per iteration (s): 0.42 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.313518E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.283 | TFLOPs: 32.23 | 7: iteration 43380/ 115203 | consumed samples: 11105280 | consumed tokens: 22743613440 | elapsed time per iteration (s): 0.42 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.328173E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.523 | TFLOPs: 31.72 | 7: iteration 43390/ 115203 | consumed samples: 11107840 | consumed tokens: 22748856320 | elapsed time per iteration (s): 0.43 | learning rate: 1.457E-04 | global batch size: 256 | lm loss: 2.320292E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.649 | TFLOPs: 31.25 | 7: iteration 43400/ 115203 | consumed samples: 11110400 | consumed tokens: 22754099200 | elapsed time per iteration (s): 0.42 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.332990E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.436 | TFLOPs: 31.66 | 7: iteration 43410/ 115203 | consumed samples: 11112960 | consumed tokens: 22759342080 | elapsed time per iteration (s): 0.43 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.327494E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.778 | TFLOPs: 30.94 | 7: iteration 43420/ 115203 | consumed samples: 11115520 | consumed tokens: 22764584960 | elapsed time per iteration (s): 0.42 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.310257E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.215 | TFLOPs: 31.96 | 7: iteration 43430/ 115203 | consumed samples: 11118080 | consumed tokens: 22769827840 | elapsed time per iteration (s): 0.43 | learning rate: 1.456E-04 | global batch size: 256 | lm loss: 2.312040E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.633 | TFLOPs: 30.88 | 7: iteration 43440/ 115203 | consumed samples: 11120640 | consumed tokens: 22775070720 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.276187E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.673 | TFLOPs: 31.67 | 7: iteration 43450/ 115203 | consumed samples: 11123200 | consumed tokens: 22780313600 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.295691E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.817 | TFLOPs: 31.68 | 7: iteration 43460/ 115203 | consumed samples: 11125760 | consumed tokens: 22785556480 | elapsed time per iteration (s): 0.43 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.344959E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.167 | TFLOPs: 30.91 | 7: iteration 43470/ 115203 | consumed samples: 11128320 | consumed tokens: 22790799360 | elapsed time per iteration (s): 0.42 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.341021E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.162 | TFLOPs: 31.86 | 7: iteration 43480/ 115203 | consumed samples: 11130880 | consumed tokens: 22796042240 | elapsed time per iteration (s): 0.44 | learning rate: 1.455E-04 | global batch size: 256 | lm loss: 2.324306E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.416 | TFLOPs: 30.19 | 7: iteration 43490/ 115203 | consumed samples: 11133440 | consumed tokens: 22801285120 | elapsed time per iteration (s): 0.43 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.377344E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.063 | TFLOPs: 31.54 | 7: iteration 43500/ 115203 | consumed samples: 11136000 | consumed tokens: 22806528000 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.328246E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.466 | TFLOPs: 31.61 | 7: iteration 43510/ 115203 | consumed samples: 11138560 | consumed tokens: 22811770880 | elapsed time per iteration (s): 0.42 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.284136E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.024 | TFLOPs: 31.95 | 7: iteration 43520/ 115203 | consumed samples: 11141120 | consumed tokens: 22817013760 | elapsed time per iteration (s): 0.43 | learning rate: 1.454E-04 | global batch size: 256 | lm loss: 2.311947E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.723 | TFLOPs: 31.41 | 7: iteration 43530/ 115203 | consumed samples: 11143680 | consumed tokens: 22822256640 | elapsed time per iteration (s): 0.42 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.327085E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.673 | TFLOPs: 31.99 | 7: iteration 43540/ 115203 | consumed samples: 11146240 | consumed tokens: 22827499520 | elapsed time per iteration (s): 0.42 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.346459E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.638 | TFLOPs: 32.25 | 7: iteration 43550/ 115203 | consumed samples: 11148800 | consumed tokens: 22832742400 | elapsed time per iteration (s): 0.42 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.288055E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.599 | TFLOPs: 31.98 | 7: iteration 43560/ 115203 | consumed samples: 11151360 | consumed tokens: 22837985280 | elapsed time per iteration (s): 0.43 | learning rate: 1.453E-04 | global batch size: 256 | lm loss: 2.308797E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.557 | TFLOPs: 31.41 | 7: iteration 43570/ 115203 | consumed samples: 11153920 | consumed tokens: 22843228160 | elapsed time per iteration (s): 0.42 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.296701E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.434 | TFLOPs: 31.92 | 7: iteration 43580/ 115203 | consumed samples: 11156480 | consumed tokens: 22848471040 | elapsed time per iteration (s): 0.42 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.339277E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.213 | TFLOPs: 32.12 | 7: iteration 43590/ 115203 | consumed samples: 11159040 | consumed tokens: 22853713920 | elapsed time per iteration (s): 0.43 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.324819E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.833 | TFLOPs: 31.47 | 7: iteration 43600/ 115203 | consumed samples: 11161600 | consumed tokens: 22858956800 | elapsed time per iteration (s): 0.42 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.303329E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.234 | TFLOPs: 31.97 | 7: iteration 43610/ 115203 | consumed samples: 11164160 | consumed tokens: 22864199680 | elapsed time per iteration (s): 0.45 | learning rate: 1.452E-04 | global batch size: 256 | lm loss: 2.283326E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.699 | TFLOPs: 29.79 | 7: iteration 43620/ 115203 | consumed samples: 11166720 | consumed tokens: 22869442560 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.315892E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.493 | TFLOPs: 31.19 | 7: iteration 43630/ 115203 | consumed samples: 11169280 | consumed tokens: 22874685440 | elapsed time per iteration (s): 0.42 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.314929E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.477 | TFLOPs: 32.03 | 7: iteration 43640/ 115203 | consumed samples: 11171840 | consumed tokens: 22879928320 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.329579E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.929 | TFLOPs: 31.42 | 7: iteration 43650/ 115203 | consumed samples: 11174400 | consumed tokens: 22885171200 | elapsed time per iteration (s): 0.43 | learning rate: 1.451E-04 | global batch size: 256 | lm loss: 2.329363E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.621 | TFLOPs: 31.20 | 7: iteration 43660/ 115203 | consumed samples: 11176960 | consumed tokens: 22890414080 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.301121E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.384 | TFLOPs: 31.19 | 7: iteration 43670/ 115203 | consumed samples: 11179520 | consumed tokens: 22895656960 | elapsed time per iteration (s): 0.42 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.316641E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.710 | TFLOPs: 31.83 | 7: iteration 43680/ 115203 | consumed samples: 11182080 | consumed tokens: 22900899840 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.313044E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.921 | TFLOPs: 31.58 | 7: iteration 43690/ 115203 | consumed samples: 11184640 | consumed tokens: 22906142720 | elapsed time per iteration (s): 0.43 | learning rate: 1.450E-04 | global batch size: 256 | lm loss: 2.353031E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.693 | TFLOPs: 31.57 | 7: iteration 43700/ 115203 | consumed samples: 11187200 | consumed tokens: 22911385600 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.345465E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.280 | TFLOPs: 31.55 | 7: iteration 43710/ 115203 | consumed samples: 11189760 | consumed tokens: 22916628480 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.355828E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.577 | TFLOPs: 31.20 | 7: iteration 43720/ 115203 | consumed samples: 11192320 | consumed tokens: 22921871360 | elapsed time per iteration (s): 0.43 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.317768E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.045 | TFLOPs: 31.06 | 7: iteration 43730/ 115203 | consumed samples: 11194880 | consumed tokens: 22927114240 | elapsed time per iteration (s): 0.42 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.317133E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.734 | TFLOPs: 31.68 | 7: iteration 43740/ 115203 | consumed samples: 11197440 | consumed tokens: 22932357120 | elapsed time per iteration (s): 0.44 | learning rate: 1.449E-04 | global batch size: 256 | lm loss: 2.309739E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.342 | TFLOPs: 30.40 | 7: iteration 43750/ 115203 | consumed samples: 11200000 | consumed tokens: 22937600000 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.314083E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.554 | TFLOPs: 31.72 | 7: iteration 43760/ 115203 | consumed samples: 11202560 | consumed tokens: 22942842880 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.332235E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.942 | TFLOPs: 31.74 | 7: iteration 43770/ 115203 | consumed samples: 11205120 | consumed tokens: 22948085760 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.314020E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.195 | TFLOPs: 31.70 | 7: iteration 43780/ 115203 | consumed samples: 11207680 | consumed tokens: 22953328640 | elapsed time per iteration (s): 0.42 | learning rate: 1.448E-04 | global batch size: 256 | lm loss: 2.321318E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.698 | TFLOPs: 32.04 | 7: iteration 43790/ 115203 | consumed samples: 11210240 | consumed tokens: 22958571520 | elapsed time per iteration (s): 0.43 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.309614E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.217 | TFLOPs: 31.23 | 7: iteration 43800/ 115203 | consumed samples: 11212800 | consumed tokens: 22963814400 | elapsed time per iteration (s): 0.43 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.331273E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.580 | TFLOPs: 31.25 | 7: iteration 43810/ 115203 | consumed samples: 11215360 | consumed tokens: 22969057280 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.351057E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.853 | TFLOPs: 32.00 | 7: iteration 43820/ 115203 | consumed samples: 11217920 | consumed tokens: 22974300160 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.326274E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.642 | TFLOPs: 31.93 | 7: iteration 43830/ 115203 | consumed samples: 11220480 | consumed tokens: 22979543040 | elapsed time per iteration (s): 0.42 | learning rate: 1.447E-04 | global batch size: 256 | lm loss: 2.335533E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.472 | TFLOPs: 31.66 | 7: iteration 43840/ 115203 | consumed samples: 11223040 | consumed tokens: 22984785920 | elapsed time per iteration (s): 0.42 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.310983E+00 | grad norm: 0.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.002 | TFLOPs: 31.80 | 7: iteration 43850/ 115203 | consumed samples: 11225600 | consumed tokens: 22990028800 | elapsed time per iteration (s): 0.42 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.325634E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.469 | TFLOPs: 31.93 | 7: iteration 43860/ 115203 | consumed samples: 11228160 | consumed tokens: 22995271680 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.308014E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.078 | TFLOPs: 30.96 | 7: iteration 43870/ 115203 | consumed samples: 11230720 | consumed tokens: 23000514560 | elapsed time per iteration (s): 0.43 | learning rate: 1.446E-04 | global batch size: 256 | lm loss: 2.298008E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.297 | TFLOPs: 30.92 | 7: iteration 43880/ 115203 | consumed samples: 11233280 | consumed tokens: 23005757440 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.305112E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.485 | TFLOPs: 31.72 | 7: iteration 43890/ 115203 | consumed samples: 11235840 | consumed tokens: 23011000320 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.334105E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.629 | TFLOPs: 31.93 | 7: iteration 43900/ 115203 | consumed samples: 11238400 | consumed tokens: 23016243200 | elapsed time per iteration (s): 0.42 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.298629E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.644 | TFLOPs: 31.72 | 7: iteration 43910/ 115203 | consumed samples: 11240960 | consumed tokens: 23021486080 | elapsed time per iteration (s): 0.43 | learning rate: 1.445E-04 | global batch size: 256 | lm loss: 2.290367E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.259 | TFLOPs: 31.55 | 7: iteration 43920/ 115203 | consumed samples: 11243520 | consumed tokens: 23026728960 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.321478E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.830 | TFLOPs: 31.00 | 7: iteration 43930/ 115203 | consumed samples: 11246080 | consumed tokens: 23031971840 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.345811E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.469 | TFLOPs: 31.19 | 7: iteration 43940/ 115203 | consumed samples: 11248640 | consumed tokens: 23037214720 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.338734E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.809 | TFLOPs: 31.26 | 7: iteration 43950/ 115203 | consumed samples: 11251200 | consumed tokens: 23042457600 | elapsed time per iteration (s): 0.43 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.323985E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.127 | TFLOPs: 31.28 | 7: iteration 43960/ 115203 | consumed samples: 11253760 | consumed tokens: 23047700480 | elapsed time per iteration (s): 0.42 | learning rate: 1.444E-04 | global batch size: 256 | lm loss: 2.346143E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.239 | TFLOPs: 31.70 | 7: iteration 43970/ 115203 | consumed samples: 11256320 | consumed tokens: 23052943360 | elapsed time per iteration (s): 0.43 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.310719E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.635 | TFLOPs: 31.57 | 7: iteration 43980/ 115203 | consumed samples: 11258880 | consumed tokens: 23058186240 | elapsed time per iteration (s): 0.42 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.309557E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.048 | TFLOPs: 32.06 | 7: iteration 43990/ 115203 | consumed samples: 11261440 | consumed tokens: 23063429120 | elapsed time per iteration (s): 0.42 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.303919E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.826 | TFLOPs: 31.73 | 0: [2022-11-28 18:14:12,262] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=0, lr=[0.00014426156962702883, 0.00014426156962702883, 0.00014426156962702883], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 44000/ 115203 | consumed samples: 11264000 | consumed tokens: 23068672000 | elapsed time per iteration (s): 0.42 | learning rate: 1.443E-04 | global batch size: 256 | lm loss: 2.292318E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.881 | TFLOPs: 31.63 | 0: steps: 44000 loss: 2.2104 iter time (s): 0.427 samples/sec: 599.423 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 44000 | lm loss value: 2.213286E+00 | lm loss PPL: 9.145724E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 44000 to checkpoints_221m 0: [2022-11-28 18:14:12,430] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step44000 is begin to save! 0: [2022-11-28 18:14:12,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:14:12,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:14:12,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:14:12,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:14:12,597] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:14:12,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:14:12,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:14:12,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:14:12,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:14:12,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:14:12,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:14:12,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:14:12,694] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:14:12,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:14:12,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:14:12,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:14:12,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:14:12,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:14:12,768] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:14:12,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:14:12,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:14:12,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:14:12,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:14:12,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:14:12,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:14:12,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:14:12,864] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:14:12,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:14:12,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:14:12,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:14:12,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:14:12,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:14:12,938] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:14:12,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:14:12,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:14:12,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:14:12,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:14:13,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:14:13,013] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:14:13,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:14:13,018] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step44000/mp_rank_00_model_states.pt 0: [2022-11-28 18:14:13,018] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:14:13,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:14:13,040] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:14:13,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 4: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:14:13,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 1: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,106] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,106] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 2: [2022-11-28 18:14:13,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:14:13,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:14:13,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 7: [2022-11-28 18:14:13,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,091] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 5: [2022-11-28 18:14:13,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:14:13,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 3: [2022-11-28 18:14:13,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:14:13,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: [2022-11-28 18:14:13,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:14:13,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 6: [2022-11-28 18:14:13,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step44000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:14:13,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step44000 is ready now! 0: successfully saved checkpoint at iteration 44000 to checkpoints_221m 7: time (ms) | save-checkpoint: 794.88 7: iteration 44010/ 115203 | consumed samples: 11266560 | consumed tokens: 23073914880 | elapsed time per iteration (s): 0.53 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.300206E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 486.640 | TFLOPs: 25.53 | 7: iteration 44020/ 115203 | consumed samples: 11269120 | consumed tokens: 23079157760 | elapsed time per iteration (s): 0.44 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.321168E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.896 | TFLOPs: 30.27 | 7: iteration 44030/ 115203 | consumed samples: 11271680 | consumed tokens: 23084400640 | elapsed time per iteration (s): 0.43 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.294961E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.524 | TFLOPs: 31.19 | 7: iteration 44040/ 115203 | consumed samples: 11274240 | consumed tokens: 23089643520 | elapsed time per iteration (s): 0.42 | learning rate: 1.442E-04 | global batch size: 256 | lm loss: 2.359670E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.133 | TFLOPs: 32.01 | 7: iteration 44050/ 115203 | consumed samples: 11276800 | consumed tokens: 23094886400 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.314545E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.161 | TFLOPs: 31.75 | 7: iteration 44060/ 115203 | consumed samples: 11279360 | consumed tokens: 23100129280 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.309335E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.821 | TFLOPs: 32.05 | 7: iteration 44070/ 115203 | consumed samples: 11281920 | consumed tokens: 23105372160 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.334442E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.466 | TFLOPs: 31.82 | 7: iteration 44080/ 115203 | consumed samples: 11284480 | consumed tokens: 23110615040 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.305673E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.014 | TFLOPs: 31.64 | 7: iteration 44090/ 115203 | consumed samples: 11287040 | consumed tokens: 23115857920 | elapsed time per iteration (s): 0.42 | learning rate: 1.441E-04 | global batch size: 256 | lm loss: 2.311979E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.071 | TFLOPs: 31.69 | 7: iteration 44100/ 115203 | consumed samples: 11289600 | consumed tokens: 23121100800 | elapsed time per iteration (s): 0.42 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.335173E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.428 | TFLOPs: 31.82 | 7: iteration 44110/ 115203 | consumed samples: 11292160 | consumed tokens: 23126343680 | elapsed time per iteration (s): 0.43 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.346759E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.114 | TFLOPs: 31.43 | 7: iteration 44120/ 115203 | consumed samples: 11294720 | consumed tokens: 23131586560 | elapsed time per iteration (s): 0.42 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.309916E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.229 | TFLOPs: 31.86 | 7: iteration 44130/ 115203 | consumed samples: 11297280 | consumed tokens: 23136829440 | elapsed time per iteration (s): 0.43 | learning rate: 1.440E-04 | global batch size: 256 | lm loss: 2.309275E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.416 | TFLOPs: 31.24 | 7: iteration 44140/ 115203 | consumed samples: 11299840 | consumed tokens: 23142072320 | elapsed time per iteration (s): 0.44 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.297488E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.754 | TFLOPs: 30.73 | 7: iteration 44150/ 115203 | consumed samples: 11302400 | consumed tokens: 23147315200 | elapsed time per iteration (s): 0.42 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.350223E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.910 | TFLOPs: 32.26 | 7: iteration 44160/ 115203 | consumed samples: 11304960 | consumed tokens: 23152558080 | elapsed time per iteration (s): 0.43 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.315135E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.382 | TFLOPs: 31.40 | 7: iteration 44170/ 115203 | consumed samples: 11307520 | consumed tokens: 23157800960 | elapsed time per iteration (s): 0.43 | learning rate: 1.439E-04 | global batch size: 256 | lm loss: 2.293747E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.861 | TFLOPs: 31.16 | 7: iteration 44180/ 115203 | consumed samples: 11310080 | consumed tokens: 23163043840 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.290537E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.549 | TFLOPs: 31.46 | 7: iteration 44190/ 115203 | consumed samples: 11312640 | consumed tokens: 23168286720 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.308305E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.460 | TFLOPs: 31.30 | 7: iteration 44200/ 115203 | consumed samples: 11315200 | consumed tokens: 23173529600 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.288375E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.599 | TFLOPs: 31.46 | 7: iteration 44210/ 115203 | consumed samples: 11317760 | consumed tokens: 23178772480 | elapsed time per iteration (s): 0.43 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.313515E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.395 | TFLOPs: 31.24 | 7: iteration 44220/ 115203 | consumed samples: 11320320 | consumed tokens: 23184015360 | elapsed time per iteration (s): 0.42 | learning rate: 1.438E-04 | global batch size: 256 | lm loss: 2.327894E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.524 | TFLOPs: 31.77 | 7: iteration 44230/ 115203 | consumed samples: 11322880 | consumed tokens: 23189258240 | elapsed time per iteration (s): 0.42 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.301138E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.257 | TFLOPs: 31.97 | 7: iteration 44240/ 115203 | consumed samples: 11325440 | consumed tokens: 23194501120 | elapsed time per iteration (s): 0.43 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.326260E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.286 | TFLOPs: 31.29 | 7: iteration 44250/ 115203 | consumed samples: 11328000 | consumed tokens: 23199744000 | elapsed time per iteration (s): 0.42 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.317896E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.497 | TFLOPs: 31.98 | 7: iteration 44260/ 115203 | consumed samples: 11330560 | consumed tokens: 23204986880 | elapsed time per iteration (s): 0.42 | learning rate: 1.437E-04 | global batch size: 256 | lm loss: 2.318430E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.023 | TFLOPs: 31.74 | 7: iteration 44270/ 115203 | consumed samples: 11333120 | consumed tokens: 23210229760 | elapsed time per iteration (s): 0.42 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.309917E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.348 | TFLOPs: 31.97 | 7: iteration 44280/ 115203 | consumed samples: 11335680 | consumed tokens: 23215472640 | elapsed time per iteration (s): 0.43 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.321340E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.315 | TFLOPs: 31.18 | 7: iteration 44290/ 115203 | consumed samples: 11338240 | consumed tokens: 23220715520 | elapsed time per iteration (s): 0.42 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.326820E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.998 | TFLOPs: 31.85 | 7: iteration 44300/ 115203 | consumed samples: 11340800 | consumed tokens: 23225958400 | elapsed time per iteration (s): 0.44 | learning rate: 1.436E-04 | global batch size: 256 | lm loss: 2.277483E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.413 | TFLOPs: 30.82 | 7: iteration 44310/ 115203 | consumed samples: 11343360 | consumed tokens: 23231201280 | elapsed time per iteration (s): 0.43 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.333782E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.975 | TFLOPs: 31.53 | 7: iteration 44320/ 115203 | consumed samples: 11345920 | consumed tokens: 23236444160 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.293978E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.194 | TFLOPs: 31.86 | 7: iteration 44330/ 115203 | consumed samples: 11348480 | consumed tokens: 23241687040 | elapsed time per iteration (s): 0.43 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.323467E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.100 | TFLOPs: 31.59 | 7: iteration 44340/ 115203 | consumed samples: 11351040 | consumed tokens: 23246929920 | elapsed time per iteration (s): 0.42 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.289214E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.847 | TFLOPs: 31.84 | 7: iteration 44350/ 115203 | consumed samples: 11353600 | consumed tokens: 23252172800 | elapsed time per iteration (s): 0.43 | learning rate: 1.435E-04 | global batch size: 256 | lm loss: 2.316394E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.511 | TFLOPs: 31.56 | 7: iteration 44360/ 115203 | consumed samples: 11356160 | consumed tokens: 23257415680 | elapsed time per iteration (s): 0.42 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.316104E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.268 | TFLOPs: 32.02 | 7: iteration 44370/ 115203 | consumed samples: 11358720 | consumed tokens: 23262658560 | elapsed time per iteration (s): 0.42 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.306194E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.465 | TFLOPs: 31.77 | 7: iteration 44380/ 115203 | consumed samples: 11361280 | consumed tokens: 23267901440 | elapsed time per iteration (s): 0.42 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.336536E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.452 | TFLOPs: 32.03 | 7: iteration 44390/ 115203 | consumed samples: 11363840 | consumed tokens: 23273144320 | elapsed time per iteration (s): 0.43 | learning rate: 1.434E-04 | global batch size: 256 | lm loss: 2.326647E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.800 | TFLOPs: 31.26 | 7: iteration 44400/ 115203 | consumed samples: 11366400 | consumed tokens: 23278387200 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.304431E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.808 | TFLOPs: 31.52 | 7: iteration 44410/ 115203 | consumed samples: 11368960 | consumed tokens: 23283630080 | elapsed time per iteration (s): 0.42 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.324563E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.359 | TFLOPs: 31.97 | 7: iteration 44420/ 115203 | consumed samples: 11371520 | consumed tokens: 23288872960 | elapsed time per iteration (s): 0.42 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.321107E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.955 | TFLOPs: 31.74 | 7: iteration 44430/ 115203 | consumed samples: 11374080 | consumed tokens: 23294115840 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.279690E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.110 | TFLOPs: 31.43 | 7: iteration 44440/ 115203 | consumed samples: 11376640 | consumed tokens: 23299358720 | elapsed time per iteration (s): 0.43 | learning rate: 1.433E-04 | global batch size: 256 | lm loss: 2.306418E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.981 | TFLOPs: 31.22 | 7: iteration 44450/ 115203 | consumed samples: 11379200 | consumed tokens: 23304601600 | elapsed time per iteration (s): 0.44 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.344693E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.232 | TFLOPs: 30.34 | 7: iteration 44460/ 115203 | consumed samples: 11381760 | consumed tokens: 23309844480 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.287460E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.532 | TFLOPs: 31.30 | 7: iteration 44470/ 115203 | consumed samples: 11384320 | consumed tokens: 23315087360 | elapsed time per iteration (s): 0.44 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.306000E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.570 | TFLOPs: 30.62 | 7: iteration 44480/ 115203 | consumed samples: 11386880 | consumed tokens: 23320330240 | elapsed time per iteration (s): 0.43 | learning rate: 1.432E-04 | global batch size: 256 | lm loss: 2.331119E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.888 | TFLOPs: 31.11 | 7: iteration 44490/ 115203 | consumed samples: 11389440 | consumed tokens: 23325573120 | elapsed time per iteration (s): 0.44 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.285834E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.465 | TFLOPs: 30.72 | 7: iteration 44500/ 115203 | consumed samples: 11392000 | consumed tokens: 23330816000 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.315170E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.175 | TFLOPs: 31.60 | 7: iteration 44510/ 115203 | consumed samples: 11394560 | consumed tokens: 23336058880 | elapsed time per iteration (s): 0.43 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.328709E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.706 | TFLOPs: 31.41 | 7: iteration 44520/ 115203 | consumed samples: 11397120 | consumed tokens: 23341301760 | elapsed time per iteration (s): 0.42 | learning rate: 1.431E-04 | global batch size: 256 | lm loss: 2.317614E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.024 | TFLOPs: 32.01 | 7: iteration 44530/ 115203 | consumed samples: 11399680 | consumed tokens: 23346544640 | elapsed time per iteration (s): 0.45 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.287009E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.758 | TFLOPs: 30.10 | 7: iteration 44540/ 115203 | consumed samples: 11402240 | consumed tokens: 23351787520 | elapsed time per iteration (s): 0.43 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.333567E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.973 | TFLOPs: 31.58 | 7: iteration 44550/ 115203 | consumed samples: 11404800 | consumed tokens: 23357030400 | elapsed time per iteration (s): 0.42 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.306558E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.454 | TFLOPs: 31.82 | 7: iteration 44560/ 115203 | consumed samples: 11407360 | consumed tokens: 23362273280 | elapsed time per iteration (s): 0.42 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.287589E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.686 | TFLOPs: 31.67 | 7: iteration 44570/ 115203 | consumed samples: 11409920 | consumed tokens: 23367516160 | elapsed time per iteration (s): 0.42 | learning rate: 1.430E-04 | global batch size: 256 | lm loss: 2.320768E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.999 | TFLOPs: 32.01 | 7: iteration 44580/ 115203 | consumed samples: 11412480 | consumed tokens: 23372759040 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.313731E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.320 | TFLOPs: 31.66 | 7: iteration 44590/ 115203 | consumed samples: 11415040 | consumed tokens: 23378001920 | elapsed time per iteration (s): 0.43 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.301457E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.521 | TFLOPs: 31.56 | 7: iteration 44600/ 115203 | consumed samples: 11417600 | consumed tokens: 23383244800 | elapsed time per iteration (s): 0.43 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.296194E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.202 | TFLOPs: 31.54 | 7: iteration 44610/ 115203 | consumed samples: 11420160 | consumed tokens: 23388487680 | elapsed time per iteration (s): 0.42 | learning rate: 1.429E-04 | global batch size: 256 | lm loss: 2.299790E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.042 | TFLOPs: 31.96 | 7: iteration 44620/ 115203 | consumed samples: 11422720 | consumed tokens: 23393730560 | elapsed time per iteration (s): 0.42 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.317827E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.641 | TFLOPs: 31.83 | 7: iteration 44630/ 115203 | consumed samples: 11425280 | consumed tokens: 23398973440 | elapsed time per iteration (s): 0.44 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.313714E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.729 | TFLOPs: 30.26 | 7: iteration 44640/ 115203 | consumed samples: 11427840 | consumed tokens: 23404216320 | elapsed time per iteration (s): 0.43 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.315904E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.308 | TFLOPs: 31.39 | 7: iteration 44650/ 115203 | consumed samples: 11430400 | consumed tokens: 23409459200 | elapsed time per iteration (s): 0.43 | learning rate: 1.428E-04 | global batch size: 256 | lm loss: 2.308336E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.962 | TFLOPs: 31.22 | 7: iteration 44660/ 115203 | consumed samples: 11432960 | consumed tokens: 23414702080 | elapsed time per iteration (s): 0.76 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.275328E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 336.997 | TFLOPs: 17.68 | 7: iteration 44670/ 115203 | consumed samples: 11435520 | consumed tokens: 23419944960 | elapsed time per iteration (s): 0.43 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.314122E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.927 | TFLOPs: 31.37 | 7: iteration 44680/ 115203 | consumed samples: 11438080 | consumed tokens: 23425187840 | elapsed time per iteration (s): 1.14 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.287003E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 224.439 | TFLOPs: 11.78 | 7: iteration 44690/ 115203 | consumed samples: 11440640 | consumed tokens: 23430430720 | elapsed time per iteration (s): 0.44 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.298862E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.240 | TFLOPs: 30.29 | 7: iteration 44700/ 115203 | consumed samples: 11443200 | consumed tokens: 23435673600 | elapsed time per iteration (s): 0.44 | learning rate: 1.427E-04 | global batch size: 256 | lm loss: 2.344028E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.423 | TFLOPs: 30.35 | 7: iteration 44710/ 115203 | consumed samples: 11445760 | consumed tokens: 23440916480 | elapsed time per iteration (s): 0.44 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.334480E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.502 | TFLOPs: 30.88 | 7: iteration 44720/ 115203 | consumed samples: 11448320 | consumed tokens: 23446159360 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.309254E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.494 | TFLOPs: 31.45 | 7: iteration 44730/ 115203 | consumed samples: 11450880 | consumed tokens: 23451402240 | elapsed time per iteration (s): 0.44 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.307615E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.140 | TFLOPs: 30.65 | 7: iteration 44740/ 115203 | consumed samples: 11453440 | consumed tokens: 23456645120 | elapsed time per iteration (s): 0.43 | learning rate: 1.426E-04 | global batch size: 256 | lm loss: 2.316197E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.623 | TFLOPs: 30.94 | 7: iteration 44750/ 115203 | consumed samples: 11456000 | consumed tokens: 23461888000 | elapsed time per iteration (s): 0.43 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.323600E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.991 | TFLOPs: 31.17 | 7: iteration 44760/ 115203 | consumed samples: 11458560 | consumed tokens: 23467130880 | elapsed time per iteration (s): 0.43 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.305069E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.672 | TFLOPs: 30.89 | 7: iteration 44770/ 115203 | consumed samples: 11461120 | consumed tokens: 23472373760 | elapsed time per iteration (s): 0.48 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.334893E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 538.358 | TFLOPs: 28.25 | 7: iteration 44780/ 115203 | consumed samples: 11463680 | consumed tokens: 23477616640 | elapsed time per iteration (s): 0.44 | learning rate: 1.425E-04 | global batch size: 256 | lm loss: 2.306973E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.353 | TFLOPs: 30.45 | 7: iteration 44790/ 115203 | consumed samples: 11466240 | consumed tokens: 23482859520 | elapsed time per iteration (s): 0.43 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.322264E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.494 | TFLOPs: 31.35 | 7: iteration 44800/ 115203 | consumed samples: 11468800 | consumed tokens: 23488102400 | elapsed time per iteration (s): 0.44 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.316375E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.376 | TFLOPs: 30.82 | 7: iteration 44810/ 115203 | consumed samples: 11471360 | consumed tokens: 23493345280 | elapsed time per iteration (s): 0.43 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.304833E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.823 | TFLOPs: 31.37 | 7: iteration 44820/ 115203 | consumed samples: 11473920 | consumed tokens: 23498588160 | elapsed time per iteration (s): 0.43 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.315465E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.038 | TFLOPs: 30.96 | 7: iteration 44830/ 115203 | consumed samples: 11476480 | consumed tokens: 23503831040 | elapsed time per iteration (s): 0.44 | learning rate: 1.424E-04 | global batch size: 256 | lm loss: 2.305659E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.210 | TFLOPs: 30.65 | 7: iteration 44840/ 115203 | consumed samples: 11479040 | consumed tokens: 23509073920 | elapsed time per iteration (s): 0.45 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.306792E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.983 | TFLOPs: 29.59 | 7: iteration 44850/ 115203 | consumed samples: 11481600 | consumed tokens: 23514316800 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.295210E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.251 | TFLOPs: 31.34 | 7: iteration 44860/ 115203 | consumed samples: 11484160 | consumed tokens: 23519559680 | elapsed time per iteration (s): 0.42 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.311886E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.109 | TFLOPs: 31.64 | 7: iteration 44870/ 115203 | consumed samples: 11486720 | consumed tokens: 23524802560 | elapsed time per iteration (s): 0.43 | learning rate: 1.423E-04 | global batch size: 256 | lm loss: 2.335764E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.787 | TFLOPs: 31.21 | 7: iteration 44880/ 115203 | consumed samples: 11489280 | consumed tokens: 23530045440 | elapsed time per iteration (s): 0.43 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.334580E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.565 | TFLOPs: 31.41 | 7: iteration 44890/ 115203 | consumed samples: 11491840 | consumed tokens: 23535288320 | elapsed time per iteration (s): 0.43 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.325162E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.672 | TFLOPs: 30.94 | 7: iteration 44900/ 115203 | consumed samples: 11494400 | consumed tokens: 23540531200 | elapsed time per iteration (s): 0.45 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.320306E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.958 | TFLOPs: 30.01 | 7: iteration 44910/ 115203 | consumed samples: 11496960 | consumed tokens: 23545774080 | elapsed time per iteration (s): 0.43 | learning rate: 1.422E-04 | global batch size: 256 | lm loss: 2.294720E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.013 | TFLOPs: 31.01 | 7: iteration 44920/ 115203 | consumed samples: 11499520 | consumed tokens: 23551016960 | elapsed time per iteration (s): 0.44 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.349179E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.562 | TFLOPs: 30.57 | 7: iteration 44930/ 115203 | consumed samples: 11502080 | consumed tokens: 23556259840 | elapsed time per iteration (s): 0.42 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.306071E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.991 | TFLOPs: 32.01 | 7: iteration 44940/ 115203 | consumed samples: 11504640 | consumed tokens: 23561502720 | elapsed time per iteration (s): 0.43 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.335218E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.942 | TFLOPs: 31.06 | 7: iteration 44950/ 115203 | consumed samples: 11507200 | consumed tokens: 23566745600 | elapsed time per iteration (s): 0.45 | learning rate: 1.421E-04 | global batch size: 256 | lm loss: 2.307176E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.590 | TFLOPs: 29.73 | 7: iteration 44960/ 115203 | consumed samples: 11509760 | consumed tokens: 23571988480 | elapsed time per iteration (s): 0.44 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.321463E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.178 | TFLOPs: 30.23 | 7: iteration 44970/ 115203 | consumed samples: 11512320 | consumed tokens: 23577231360 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.294147E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.354 | TFLOPs: 30.92 | 7: iteration 44980/ 115203 | consumed samples: 11514880 | consumed tokens: 23582474240 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.310812E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.667 | TFLOPs: 31.25 | 7: iteration 44990/ 115203 | consumed samples: 11517440 | consumed tokens: 23587717120 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.313053E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.588 | TFLOPs: 31.41 | 7: iteration 45000/ 115203 | consumed samples: 11520000 | consumed tokens: 23592960000 | elapsed time per iteration (s): 0.43 | learning rate: 1.420E-04 | global batch size: 256 | lm loss: 2.288229E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.436 | TFLOPs: 31.19 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 45000 | lm loss value: 2.256400E+00 | lm loss PPL: 9.548653E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 45000 to checkpoints_221m 0: [2022-11-28 18:21:33,589] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step45000 is begin to save! 0: [2022-11-28 18:21:33,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:21:33,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:21:33,705] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:21:33,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:21:33,727] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:21:33,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:21:33,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:21:33,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:21:33,776] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:21:33,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:21:33,800] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:21:33,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:21:33,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:21:33,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:21:33,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:21:33,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:21:33,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:21:33,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:21:33,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:21:33,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:21:33,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:21:33,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:21:33,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:21:33,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:21:33,972] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:21:33,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:21:33,996] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:21:34,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:21:34,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:21:34,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:21:34,046] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:21:34,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:21:34,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:21:34,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:21:34,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:21:34,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:21:34,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:21:34,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:21:34,354] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:21:34,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:21:34,359] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step45000/mp_rank_00_model_states.pt 0: [2022-11-28 18:21:34,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:21:34,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:21:34,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:21:34,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,441] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,441] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 2: [2022-11-28 18:21:34,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:21:34,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:21:34,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:21:34,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:21:34,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,452] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,452] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 1: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:21:34,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:21:34,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,454] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 6: [2022-11-28 18:21:34,454] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 5: [2022-11-28 18:21:34,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:21:34,442] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 18:21:34,442] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 3: [2022-11-28 18:21:34,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:21:34,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:21:34,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:21:34,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:21:34,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: [2022-11-28 18:21:34,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step45000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:21:34,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step45000 is ready now! 0: successfully saved checkpoint at iteration 45000 to checkpoints_221m 7: time (ms) | save-checkpoint: 933.48 7: iteration 45010/ 115203 | consumed samples: 11522560 | consumed tokens: 23598202880 | elapsed time per iteration (s): 0.54 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.306580E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 473.078 | TFLOPs: 24.82 | 7: iteration 45020/ 115203 | consumed samples: 11525120 | consumed tokens: 23603445760 | elapsed time per iteration (s): 0.44 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.306082E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.070 | TFLOPs: 30.75 | 7: iteration 45030/ 115203 | consumed samples: 11527680 | consumed tokens: 23608688640 | elapsed time per iteration (s): 0.43 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.270732E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.790 | TFLOPs: 31.37 | 7: iteration 45040/ 115203 | consumed samples: 11530240 | consumed tokens: 23613931520 | elapsed time per iteration (s): 0.43 | learning rate: 1.419E-04 | global batch size: 256 | lm loss: 2.327599E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.777 | TFLOPs: 30.89 | 7: iteration 45050/ 115203 | consumed samples: 11532800 | consumed tokens: 23619174400 | elapsed time per iteration (s): 0.45 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.328326E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.805 | TFLOPs: 30.11 | 7: iteration 45060/ 115203 | consumed samples: 11535360 | consumed tokens: 23624417280 | elapsed time per iteration (s): 0.44 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.312760E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.180 | TFLOPs: 30.55 | 7: iteration 45070/ 115203 | consumed samples: 11537920 | consumed tokens: 23629660160 | elapsed time per iteration (s): 0.44 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.290013E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.864 | TFLOPs: 30.84 | 7: iteration 45080/ 115203 | consumed samples: 11540480 | consumed tokens: 23634903040 | elapsed time per iteration (s): 0.42 | learning rate: 1.418E-04 | global batch size: 256 | lm loss: 2.315006E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.225 | TFLOPs: 31.70 | 7: iteration 45090/ 115203 | consumed samples: 11543040 | consumed tokens: 23640145920 | elapsed time per iteration (s): 0.45 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.313602E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.855 | TFLOPs: 29.79 | 7: iteration 45100/ 115203 | consumed samples: 11545600 | consumed tokens: 23645388800 | elapsed time per iteration (s): 0.43 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.310539E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.748 | TFLOPs: 31.05 | 7: iteration 45110/ 115203 | consumed samples: 11548160 | consumed tokens: 23650631680 | elapsed time per iteration (s): 0.44 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.336159E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.695 | TFLOPs: 30.47 | 7: iteration 45120/ 115203 | consumed samples: 11550720 | consumed tokens: 23655874560 | elapsed time per iteration (s): 0.60 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.313472E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 428.486 | TFLOPs: 22.48 | 7: iteration 45130/ 115203 | consumed samples: 11553280 | consumed tokens: 23661117440 | elapsed time per iteration (s): 0.43 | learning rate: 1.417E-04 | global batch size: 256 | lm loss: 2.340137E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.148 | TFLOPs: 31.07 | 7: iteration 45140/ 115203 | consumed samples: 11555840 | consumed tokens: 23666360320 | elapsed time per iteration (s): 0.43 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.305176E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.845 | TFLOPs: 31.32 | 7: iteration 45150/ 115203 | consumed samples: 11558400 | consumed tokens: 23671603200 | elapsed time per iteration (s): 0.43 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.329177E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.553 | TFLOPs: 31.30 | 7: iteration 45160/ 115203 | consumed samples: 11560960 | consumed tokens: 23676846080 | elapsed time per iteration (s): 0.44 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.309690E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.611 | TFLOPs: 30.73 | 7: iteration 45170/ 115203 | consumed samples: 11563520 | consumed tokens: 23682088960 | elapsed time per iteration (s): 0.43 | learning rate: 1.416E-04 | global batch size: 256 | lm loss: 2.307255E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.749 | TFLOPs: 31.52 | 7: iteration 45180/ 115203 | consumed samples: 11566080 | consumed tokens: 23687331840 | elapsed time per iteration (s): 0.44 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.321519E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.133 | TFLOPs: 30.39 | 7: iteration 45190/ 115203 | consumed samples: 11568640 | consumed tokens: 23692574720 | elapsed time per iteration (s): 0.44 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.295484E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.788 | TFLOPs: 30.74 | 7: iteration 45200/ 115203 | consumed samples: 11571200 | consumed tokens: 23697817600 | elapsed time per iteration (s): 0.44 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.307674E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.939 | TFLOPs: 30.27 | 7: iteration 45210/ 115203 | consumed samples: 11573760 | consumed tokens: 23703060480 | elapsed time per iteration (s): 0.42 | learning rate: 1.415E-04 | global batch size: 256 | lm loss: 2.287016E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.662 | TFLOPs: 31.78 | 7: iteration 45220/ 115203 | consumed samples: 11576320 | consumed tokens: 23708303360 | elapsed time per iteration (s): 0.44 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.330555E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.532 | TFLOPs: 30.77 | 7: iteration 45230/ 115203 | consumed samples: 11578880 | consumed tokens: 23713546240 | elapsed time per iteration (s): 0.44 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.316318E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.171 | TFLOPs: 30.60 | 7: iteration 45240/ 115203 | consumed samples: 11581440 | consumed tokens: 23718789120 | elapsed time per iteration (s): 0.43 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.301283E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.899 | TFLOPs: 31.16 | 7: iteration 45250/ 115203 | consumed samples: 11584000 | consumed tokens: 23724032000 | elapsed time per iteration (s): 0.43 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.293180E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.035 | TFLOPs: 31.01 | 7: iteration 45260/ 115203 | consumed samples: 11586560 | consumed tokens: 23729274880 | elapsed time per iteration (s): 0.45 | learning rate: 1.414E-04 | global batch size: 256 | lm loss: 2.311202E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.895 | TFLOPs: 29.53 | 7: iteration 45270/ 115203 | consumed samples: 11589120 | consumed tokens: 23734517760 | elapsed time per iteration (s): 0.43 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.336848E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.030 | TFLOPs: 31.59 | 7: iteration 45280/ 115203 | consumed samples: 11591680 | consumed tokens: 23739760640 | elapsed time per iteration (s): 0.44 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.296093E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.635 | TFLOPs: 30.46 | 7: iteration 45290/ 115203 | consumed samples: 11594240 | consumed tokens: 23745003520 | elapsed time per iteration (s): 0.44 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.332496E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.428 | TFLOPs: 30.72 | 7: iteration 45300/ 115203 | consumed samples: 11596800 | consumed tokens: 23750246400 | elapsed time per iteration (s): 0.45 | learning rate: 1.413E-04 | global batch size: 256 | lm loss: 2.324410E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.716 | TFLOPs: 29.94 | 7: iteration 45310/ 115203 | consumed samples: 11599360 | consumed tokens: 23755489280 | elapsed time per iteration (s): 0.47 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.270710E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 548.280 | TFLOPs: 28.77 | 7: iteration 45320/ 115203 | consumed samples: 11601920 | consumed tokens: 23760732160 | elapsed time per iteration (s): 0.44 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.297422E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.680 | TFLOPs: 30.31 | 7: iteration 45330/ 115203 | consumed samples: 11604480 | consumed tokens: 23765975040 | elapsed time per iteration (s): 0.44 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.306341E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.814 | TFLOPs: 30.42 | 7: iteration 45340/ 115203 | consumed samples: 11607040 | consumed tokens: 23771217920 | elapsed time per iteration (s): 0.43 | learning rate: 1.412E-04 | global batch size: 256 | lm loss: 2.312812E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.752 | TFLOPs: 30.94 | 7: iteration 45350/ 115203 | consumed samples: 11609600 | consumed tokens: 23776460800 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.293480E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.161 | TFLOPs: 31.12 | 7: iteration 45360/ 115203 | consumed samples: 11612160 | consumed tokens: 23781703680 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.313509E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.404 | TFLOPs: 31.50 | 7: iteration 45370/ 115203 | consumed samples: 11614720 | consumed tokens: 23786946560 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.294505E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.347 | TFLOPs: 31.45 | 7: iteration 45380/ 115203 | consumed samples: 11617280 | consumed tokens: 23792189440 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.333378E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.684 | TFLOPs: 31.15 | 7: iteration 45390/ 115203 | consumed samples: 11619840 | consumed tokens: 23797432320 | elapsed time per iteration (s): 0.43 | learning rate: 1.411E-04 | global batch size: 256 | lm loss: 2.362922E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.749 | TFLOPs: 31.36 | 7: iteration 45400/ 115203 | consumed samples: 11622400 | consumed tokens: 23802675200 | elapsed time per iteration (s): 0.44 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.304667E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.033 | TFLOPs: 30.64 | 7: iteration 45410/ 115203 | consumed samples: 11624960 | consumed tokens: 23807918080 | elapsed time per iteration (s): 0.44 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.295744E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.110 | TFLOPs: 30.65 | 7: iteration 45420/ 115203 | consumed samples: 11627520 | consumed tokens: 23813160960 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.315713E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.033 | TFLOPs: 30.96 | 7: iteration 45430/ 115203 | consumed samples: 11630080 | consumed tokens: 23818403840 | elapsed time per iteration (s): 0.43 | learning rate: 1.410E-04 | global batch size: 256 | lm loss: 2.298504E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.499 | TFLOPs: 31.51 | 7: iteration 45440/ 115203 | consumed samples: 11632640 | consumed tokens: 23823646720 | elapsed time per iteration (s): 0.43 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.303566E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.217 | TFLOPs: 30.92 | 7: iteration 45450/ 115203 | consumed samples: 11635200 | consumed tokens: 23828889600 | elapsed time per iteration (s): 0.44 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.355626E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.427 | TFLOPs: 30.45 | 7: iteration 45460/ 115203 | consumed samples: 11637760 | consumed tokens: 23834132480 | elapsed time per iteration (s): 0.43 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.297073E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.097 | TFLOPs: 31.33 | 7: iteration 45470/ 115203 | consumed samples: 11640320 | consumed tokens: 23839375360 | elapsed time per iteration (s): 0.43 | learning rate: 1.409E-04 | global batch size: 256 | lm loss: 2.293712E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.896 | TFLOPs: 31.53 | 7: iteration 45480/ 115203 | consumed samples: 11642880 | consumed tokens: 23844618240 | elapsed time per iteration (s): 0.43 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.347601E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.208 | TFLOPs: 31.23 | 7: iteration 45490/ 115203 | consumed samples: 11645440 | consumed tokens: 23849861120 | elapsed time per iteration (s): 0.43 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.308312E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.618 | TFLOPs: 31.04 | 7: iteration 45500/ 115203 | consumed samples: 11648000 | consumed tokens: 23855104000 | elapsed time per iteration (s): 0.44 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.313686E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.960 | TFLOPs: 30.69 | 7: iteration 45510/ 115203 | consumed samples: 11650560 | consumed tokens: 23860346880 | elapsed time per iteration (s): 0.44 | learning rate: 1.408E-04 | global batch size: 256 | lm loss: 2.319174E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.262 | TFLOPs: 30.45 | 7: iteration 45520/ 115203 | consumed samples: 11653120 | consumed tokens: 23865589760 | elapsed time per iteration (s): 0.45 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.317414E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.688 | TFLOPs: 30.00 | 7: iteration 45530/ 115203 | consumed samples: 11655680 | consumed tokens: 23870832640 | elapsed time per iteration (s): 0.43 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.301258E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.488 | TFLOPs: 30.98 | 7: iteration 45540/ 115203 | consumed samples: 11658240 | consumed tokens: 23876075520 | elapsed time per iteration (s): 0.44 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.287395E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.743 | TFLOPs: 30.68 | 7: iteration 45550/ 115203 | consumed samples: 11660800 | consumed tokens: 23881318400 | elapsed time per iteration (s): 0.43 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.307375E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.961 | TFLOPs: 31.22 | 7: iteration 45560/ 115203 | consumed samples: 11663360 | consumed tokens: 23886561280 | elapsed time per iteration (s): 0.44 | learning rate: 1.407E-04 | global batch size: 256 | lm loss: 2.341400E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.699 | TFLOPs: 30.78 | 7: iteration 45570/ 115203 | consumed samples: 11665920 | consumed tokens: 23891804160 | elapsed time per iteration (s): 0.43 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.307896E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.136 | TFLOPs: 30.96 | 7: iteration 45580/ 115203 | consumed samples: 11668480 | consumed tokens: 23897047040 | elapsed time per iteration (s): 0.44 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.332561E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.255 | TFLOPs: 30.29 | 7: iteration 45590/ 115203 | consumed samples: 11671040 | consumed tokens: 23902289920 | elapsed time per iteration (s): 0.42 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.314787E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.040 | TFLOPs: 31.64 | 7: iteration 45600/ 115203 | consumed samples: 11673600 | consumed tokens: 23907532800 | elapsed time per iteration (s): 0.43 | learning rate: 1.406E-04 | global batch size: 256 | lm loss: 2.301783E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.800 | TFLOPs: 31.47 | 7: iteration 45610/ 115203 | consumed samples: 11676160 | consumed tokens: 23912775680 | elapsed time per iteration (s): 0.44 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.296747E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.679 | TFLOPs: 30.41 | 7: iteration 45620/ 115203 | consumed samples: 11678720 | consumed tokens: 23918018560 | elapsed time per iteration (s): 0.44 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.290181E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.755 | TFLOPs: 30.58 | 7: iteration 45630/ 115203 | consumed samples: 11681280 | consumed tokens: 23923261440 | elapsed time per iteration (s): 0.44 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.297958E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.263 | TFLOPs: 30.60 | 7: iteration 45640/ 115203 | consumed samples: 11683840 | consumed tokens: 23928504320 | elapsed time per iteration (s): 0.43 | learning rate: 1.405E-04 | global batch size: 256 | lm loss: 2.341343E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.229 | TFLOPs: 31.44 | 7: iteration 45650/ 115203 | consumed samples: 11686400 | consumed tokens: 23933747200 | elapsed time per iteration (s): 0.43 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.300758E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.657 | TFLOPs: 31.04 | 7: iteration 45660/ 115203 | consumed samples: 11688960 | consumed tokens: 23938990080 | elapsed time per iteration (s): 0.45 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.324402E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.014 | TFLOPs: 29.96 | 7: iteration 45670/ 115203 | consumed samples: 11691520 | consumed tokens: 23944232960 | elapsed time per iteration (s): 0.46 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.319793E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.330 | TFLOPs: 29.24 | 7: iteration 45680/ 115203 | consumed samples: 11694080 | consumed tokens: 23949475840 | elapsed time per iteration (s): 0.44 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.273468E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.854 | TFLOPs: 30.53 | 7: iteration 45690/ 115203 | consumed samples: 11696640 | consumed tokens: 23954718720 | elapsed time per iteration (s): 0.42 | learning rate: 1.404E-04 | global batch size: 256 | lm loss: 2.332413E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.381 | TFLOPs: 31.82 | 7: iteration 45700/ 115203 | consumed samples: 11699200 | consumed tokens: 23959961600 | elapsed time per iteration (s): 0.45 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.319471E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.537 | TFLOPs: 30.04 | 7: iteration 45710/ 115203 | consumed samples: 11701760 | consumed tokens: 23965204480 | elapsed time per iteration (s): 0.45 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.272332E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.159 | TFLOPs: 29.86 | 7: iteration 45720/ 115203 | consumed samples: 11704320 | consumed tokens: 23970447360 | elapsed time per iteration (s): 0.44 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.291918E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.045 | TFLOPs: 30.70 | 7: iteration 45730/ 115203 | consumed samples: 11706880 | consumed tokens: 23975690240 | elapsed time per iteration (s): 0.45 | learning rate: 1.403E-04 | global batch size: 256 | lm loss: 2.284605E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.502 | TFLOPs: 30.04 | 7: iteration 45740/ 115203 | consumed samples: 11709440 | consumed tokens: 23980933120 | elapsed time per iteration (s): 0.43 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.294606E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | 7: iteration 45750/ 115203 | consumed samples: 11712000 | consumed tokens: 23986176000 | elapsed time per iteration (s): 0.43 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.320343E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.741 | TFLOPs: 31.41 | 7: iteration 45760/ 115203 | consumed samples: 11714560 | consumed tokens: 23991418880 | elapsed time per iteration (s): 0.44 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.308328E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.254 | TFLOPs: 30.60 | 7: iteration 45770/ 115203 | consumed samples: 11717120 | consumed tokens: 23996661760 | elapsed time per iteration (s): 0.44 | learning rate: 1.402E-04 | global batch size: 256 | lm loss: 2.294321E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.056 | TFLOPs: 30.38 | 7: iteration 45780/ 115203 | consumed samples: 11719680 | consumed tokens: 24001904640 | elapsed time per iteration (s): 0.43 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.343159E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.705 | TFLOPs: 31.26 | 7: iteration 45790/ 115203 | consumed samples: 11722240 | consumed tokens: 24007147520 | elapsed time per iteration (s): 0.43 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.337156E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.434 | TFLOPs: 31.35 | 7: iteration 45800/ 115203 | consumed samples: 11724800 | consumed tokens: 24012390400 | elapsed time per iteration (s): 0.44 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.329793E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.860 | TFLOPs: 30.74 | 7: iteration 45810/ 115203 | consumed samples: 11727360 | consumed tokens: 24017633280 | elapsed time per iteration (s): 0.42 | learning rate: 1.401E-04 | global batch size: 256 | lm loss: 2.301064E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.980 | TFLOPs: 31.74 | 7: iteration 45820/ 115203 | consumed samples: 11729920 | consumed tokens: 24022876160 | elapsed time per iteration (s): 0.43 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.311717E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.241 | TFLOPs: 31.34 | 7: iteration 45830/ 115203 | consumed samples: 11732480 | consumed tokens: 24028119040 | elapsed time per iteration (s): 0.44 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.326398E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.745 | TFLOPs: 30.31 | 7: iteration 45840/ 115203 | consumed samples: 11735040 | consumed tokens: 24033361920 | elapsed time per iteration (s): 0.43 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.297405E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.271 | TFLOPs: 30.97 | 7: iteration 45850/ 115203 | consumed samples: 11737600 | consumed tokens: 24038604800 | elapsed time per iteration (s): 0.44 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.322197E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.797 | TFLOPs: 30.53 | 7: iteration 45860/ 115203 | consumed samples: 11740160 | consumed tokens: 24043847680 | elapsed time per iteration (s): 0.43 | learning rate: 1.400E-04 | global batch size: 256 | lm loss: 2.292145E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.729 | TFLOPs: 31.05 | 7: iteration 45870/ 115203 | consumed samples: 11742720 | consumed tokens: 24049090560 | elapsed time per iteration (s): 0.44 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.313685E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.981 | TFLOPs: 30.69 | 7: iteration 45880/ 115203 | consumed samples: 11745280 | consumed tokens: 24054333440 | elapsed time per iteration (s): 0.45 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.320488E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.529 | TFLOPs: 30.14 | 7: iteration 45890/ 115203 | consumed samples: 11747840 | consumed tokens: 24059576320 | elapsed time per iteration (s): 0.44 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.349702E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.508 | TFLOPs: 30.62 | 7: iteration 45900/ 115203 | consumed samples: 11750400 | consumed tokens: 24064819200 | elapsed time per iteration (s): 0.43 | learning rate: 1.399E-04 | global batch size: 256 | lm loss: 2.324117E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.033 | TFLOPs: 31.12 | 7: iteration 45910/ 115203 | consumed samples: 11752960 | consumed tokens: 24070062080 | elapsed time per iteration (s): 0.43 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.327284E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.476 | TFLOPs: 30.98 | 7: iteration 45920/ 115203 | consumed samples: 11755520 | consumed tokens: 24075304960 | elapsed time per iteration (s): 0.44 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.315894E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.404 | TFLOPs: 30.82 | 7: iteration 45930/ 115203 | consumed samples: 11758080 | consumed tokens: 24080547840 | elapsed time per iteration (s): 0.46 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.305785E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.026 | TFLOPs: 29.49 | 7: iteration 45940/ 115203 | consumed samples: 11760640 | consumed tokens: 24085790720 | elapsed time per iteration (s): 0.42 | learning rate: 1.398E-04 | global batch size: 256 | lm loss: 2.316741E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.221 | TFLOPs: 31.70 | 7: iteration 45950/ 115203 | consumed samples: 11763200 | consumed tokens: 24091033600 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.322405E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.544 | TFLOPs: 31.30 | 7: iteration 45960/ 115203 | consumed samples: 11765760 | consumed tokens: 24096276480 | elapsed time per iteration (s): 0.44 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.320952E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.465 | TFLOPs: 30.25 | 7: iteration 45970/ 115203 | consumed samples: 11768320 | consumed tokens: 24101519360 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.295760E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.951 | TFLOPs: 31.43 | 7: iteration 45980/ 115203 | consumed samples: 11770880 | consumed tokens: 24106762240 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.312913E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.670 | TFLOPs: 31.57 | 7: iteration 45990/ 115203 | consumed samples: 11773440 | consumed tokens: 24112005120 | elapsed time per iteration (s): 0.43 | learning rate: 1.397E-04 | global batch size: 256 | lm loss: 2.305861E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.886 | TFLOPs: 31.58 | 0: [2022-11-28 18:28:51,946] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=0, lr=[0.0001396270779841331, 0.0001396270779841331, 0.0001396270779841331], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 46000/ 115203 | consumed samples: 11776000 | consumed tokens: 24117248000 | elapsed time per iteration (s): 0.42 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.340018E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.000 | TFLOPs: 31.64 | 0: steps: 46000 loss: 2.4551 iter time (s): 0.437 samples/sec: 585.744 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 46000 | lm loss value: 2.290326E+00 | lm loss PPL: 9.878154E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 46000 to checkpoints_221m 0: [2022-11-28 18:28:52,142] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step46000 is begin to save! 0: [2022-11-28 18:28:52,210] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:28:52,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:28:52,511] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:28:52,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:28:52,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:28:52,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:28:52,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:28:52,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:28:52,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:28:52,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:28:52,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:28:52,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:28:52,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:28:52,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:28:52,653] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:28:52,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:28:52,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:28:52,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:28:52,702] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:28:52,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:28:52,728] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:28:52,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:28:52,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:28:52,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:28:52,778] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:28:52,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:28:52,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:28:52,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:28:52,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:28:52,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:28:52,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:28:52,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:28:52,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:28:52,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:28:52,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:28:52,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:28:52,933] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:28:52,957] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:28:52,957] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:28:52,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:28:52,960] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step46000/mp_rank_00_model_states.pt 0: [2022-11-28 18:28:52,960] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:28:52,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:28:52,981] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:28:52,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:28:53,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 7: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:28:53,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 6: [2022-11-28 18:28:53,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 3: [2022-11-28 18:28:53,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:28:53,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:28:53,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 1: [2022-11-28 18:28:53,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:28:53,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:28:53,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:28:53,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:28:53,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 2: [2022-11-28 18:28:53,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:28:53,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,056] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 4: [2022-11-28 18:28:53,056] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:28:53,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 5: [2022-11-28 18:28:53,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:28:53,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: [2022-11-28 18:28:53,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step46000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:28:53,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step46000 is ready now! 0: successfully saved checkpoint at iteration 46000 to checkpoints_221m 7: time (ms) | save-checkpoint: 989.49 7: iteration 46010/ 115203 | consumed samples: 11778560 | consumed tokens: 24122490880 | elapsed time per iteration (s): 0.55 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.285138E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 466.786 | TFLOPs: 24.49 | 7: iteration 46020/ 115203 | consumed samples: 11781120 | consumed tokens: 24127733760 | elapsed time per iteration (s): 0.45 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.312149E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.275 | TFLOPs: 30.18 | 7: iteration 46030/ 115203 | consumed samples: 11783680 | consumed tokens: 24132976640 | elapsed time per iteration (s): 0.43 | learning rate: 1.396E-04 | global batch size: 256 | lm loss: 2.296689E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.511 | TFLOPs: 31.40 | 7: iteration 46040/ 115203 | consumed samples: 11786240 | consumed tokens: 24138219520 | elapsed time per iteration (s): 0.44 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.327376E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.004 | TFLOPs: 30.69 | 7: iteration 46050/ 115203 | consumed samples: 11788800 | consumed tokens: 24143462400 | elapsed time per iteration (s): 0.44 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.309719E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.514 | TFLOPs: 30.72 | 7: iteration 46060/ 115203 | consumed samples: 11791360 | consumed tokens: 24148705280 | elapsed time per iteration (s): 0.44 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.326897E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.387 | TFLOPs: 30.66 | 7: iteration 46070/ 115203 | consumed samples: 11793920 | consumed tokens: 24153948160 | elapsed time per iteration (s): 0.44 | learning rate: 1.395E-04 | global batch size: 256 | lm loss: 2.278745E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.465 | TFLOPs: 30.77 | 7: iteration 46080/ 115203 | consumed samples: 11796480 | consumed tokens: 24159191040 | elapsed time per iteration (s): 0.45 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.307280E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.617 | TFLOPs: 29.89 | 7: iteration 46090/ 115203 | consumed samples: 11799040 | consumed tokens: 24164433920 | elapsed time per iteration (s): 0.43 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.307506E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.230 | TFLOPs: 31.39 | 7: iteration 46100/ 115203 | consumed samples: 11801600 | consumed tokens: 24169676800 | elapsed time per iteration (s): 0.52 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.311717E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 491.760 | TFLOPs: 25.80 | 7: iteration 46110/ 115203 | consumed samples: 11804160 | consumed tokens: 24174919680 | elapsed time per iteration (s): 0.43 | learning rate: 1.394E-04 | global batch size: 256 | lm loss: 2.306001E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.533 | TFLOPs: 30.98 | 7: iteration 46120/ 115203 | consumed samples: 11806720 | consumed tokens: 24180162560 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.326511E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.615 | TFLOPs: 31.36 | 7: iteration 46130/ 115203 | consumed samples: 11809280 | consumed tokens: 24185405440 | elapsed time per iteration (s): 0.44 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.325844E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.754 | TFLOPs: 30.52 | 7: iteration 46140/ 115203 | consumed samples: 11811840 | consumed tokens: 24190648320 | elapsed time per iteration (s): 0.45 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.311495E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.479 | TFLOPs: 29.77 | 7: iteration 46150/ 115203 | consumed samples: 11814400 | consumed tokens: 24195891200 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.305919E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.389 | TFLOPs: 30.98 | 7: iteration 46160/ 115203 | consumed samples: 11816960 | consumed tokens: 24201134080 | elapsed time per iteration (s): 0.43 | learning rate: 1.393E-04 | global batch size: 256 | lm loss: 2.308504E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.151 | TFLOPs: 30.91 | 7: iteration 46170/ 115203 | consumed samples: 11819520 | consumed tokens: 24206376960 | elapsed time per iteration (s): 0.44 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.307184E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.173 | TFLOPs: 30.70 | 7: iteration 46180/ 115203 | consumed samples: 11822080 | consumed tokens: 24211619840 | elapsed time per iteration (s): 0.45 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.293396E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.791 | TFLOPs: 29.63 | 7: iteration 46190/ 115203 | consumed samples: 11824640 | consumed tokens: 24216862720 | elapsed time per iteration (s): 0.44 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.308674E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.518 | TFLOPs: 30.35 | 7: iteration 46200/ 115203 | consumed samples: 11827200 | consumed tokens: 24222105600 | elapsed time per iteration (s): 0.43 | learning rate: 1.392E-04 | global batch size: 256 | lm loss: 2.302209E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.374 | TFLOPs: 31.03 | 7: iteration 46210/ 115203 | consumed samples: 11829760 | consumed tokens: 24227348480 | elapsed time per iteration (s): 0.43 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.362765E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.972 | TFLOPs: 31.22 | 7: iteration 46220/ 115203 | consumed samples: 11832320 | consumed tokens: 24232591360 | elapsed time per iteration (s): 0.44 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.299360E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.383 | TFLOPs: 30.71 | 7: iteration 46230/ 115203 | consumed samples: 11834880 | consumed tokens: 24237834240 | elapsed time per iteration (s): 0.44 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.289520E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.717 | TFLOPs: 30.78 | 7: iteration 46240/ 115203 | consumed samples: 11837440 | consumed tokens: 24243077120 | elapsed time per iteration (s): 0.43 | learning rate: 1.391E-04 | global batch size: 256 | lm loss: 2.332189E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.002 | TFLOPs: 31.06 | 7: iteration 46250/ 115203 | consumed samples: 11840000 | consumed tokens: 24248320000 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.292397E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.608 | TFLOPs: 30.83 | 7: iteration 46260/ 115203 | consumed samples: 11842560 | consumed tokens: 24253562880 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.306315E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.350 | TFLOPs: 30.87 | 7: iteration 46270/ 115203 | consumed samples: 11845120 | consumed tokens: 24258805760 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.283846E+00 | grad norm: 0.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.438 | TFLOPs: 30.24 | 7: iteration 46280/ 115203 | consumed samples: 11847680 | consumed tokens: 24264048640 | elapsed time per iteration (s): 0.44 | learning rate: 1.390E-04 | global batch size: 256 | lm loss: 2.299031E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.416 | TFLOPs: 30.77 | 7: iteration 46290/ 115203 | consumed samples: 11850240 | consumed tokens: 24269291520 | elapsed time per iteration (s): 0.44 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.324590E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.333 | TFLOPs: 30.87 | 7: iteration 46300/ 115203 | consumed samples: 11852800 | consumed tokens: 24274534400 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.252038E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.278 | TFLOPs: 31.29 | 7: iteration 46310/ 115203 | consumed samples: 11855360 | consumed tokens: 24279777280 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.278344E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.306 | TFLOPs: 31.08 | 7: iteration 46320/ 115203 | consumed samples: 11857920 | consumed tokens: 24285020160 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.302426E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.939 | TFLOPs: 31.27 | 7: iteration 46330/ 115203 | consumed samples: 11860480 | consumed tokens: 24290263040 | elapsed time per iteration (s): 0.43 | learning rate: 1.389E-04 | global batch size: 256 | lm loss: 2.307248E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.694 | TFLOPs: 30.99 | 7: iteration 46340/ 115203 | consumed samples: 11863040 | consumed tokens: 24295505920 | elapsed time per iteration (s): 0.44 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.314292E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.122 | TFLOPs: 30.33 | 7: iteration 46350/ 115203 | consumed samples: 11865600 | consumed tokens: 24300748800 | elapsed time per iteration (s): 0.45 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.315860E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.964 | TFLOPs: 29.70 | 7: iteration 46360/ 115203 | consumed samples: 11868160 | consumed tokens: 24305991680 | elapsed time per iteration (s): 0.43 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.327503E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.574 | TFLOPs: 31.09 | 7: iteration 46370/ 115203 | consumed samples: 11870720 | consumed tokens: 24311234560 | elapsed time per iteration (s): 0.45 | learning rate: 1.388E-04 | global batch size: 256 | lm loss: 2.308047E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.198 | TFLOPs: 29.92 | 7: iteration 46380/ 115203 | consumed samples: 11873280 | consumed tokens: 24316477440 | elapsed time per iteration (s): 0.44 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.294350E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.131 | TFLOPs: 30.81 | 7: iteration 46390/ 115203 | consumed samples: 11875840 | consumed tokens: 24321720320 | elapsed time per iteration (s): 0.43 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.303943E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.052 | TFLOPs: 30.96 | 7: iteration 46400/ 115203 | consumed samples: 11878400 | consumed tokens: 24326963200 | elapsed time per iteration (s): 0.44 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.335819E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.732 | TFLOPs: 30.57 | 7: iteration 46410/ 115203 | consumed samples: 11880960 | consumed tokens: 24332206080 | elapsed time per iteration (s): 0.44 | learning rate: 1.387E-04 | global batch size: 256 | lm loss: 2.331179E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.703 | TFLOPs: 30.31 | 7: iteration 46420/ 115203 | consumed samples: 11883520 | consumed tokens: 24337448960 | elapsed time per iteration (s): 0.44 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.299709E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.812 | TFLOPs: 30.47 | 7: iteration 46430/ 115203 | consumed samples: 11886080 | consumed tokens: 24342691840 | elapsed time per iteration (s): 0.44 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.339204E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.669 | TFLOPs: 30.83 | 7: iteration 46440/ 115203 | consumed samples: 11888640 | consumed tokens: 24347934720 | elapsed time per iteration (s): 0.43 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.318030E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.436 | TFLOPs: 30.93 | 7: iteration 46450/ 115203 | consumed samples: 11891200 | consumed tokens: 24353177600 | elapsed time per iteration (s): 0.43 | learning rate: 1.386E-04 | global batch size: 256 | lm loss: 2.311829E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.249 | TFLOPs: 30.92 | 7: iteration 46460/ 115203 | consumed samples: 11893760 | consumed tokens: 24358420480 | elapsed time per iteration (s): 0.43 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.330178E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.596 | TFLOPs: 30.94 | 7: iteration 46470/ 115203 | consumed samples: 11896320 | consumed tokens: 24363663360 | elapsed time per iteration (s): 0.43 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.350190E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.026 | TFLOPs: 31.01 | 7: iteration 46480/ 115203 | consumed samples: 11898880 | consumed tokens: 24368906240 | elapsed time per iteration (s): 0.45 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.325782E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.115 | TFLOPs: 29.70 | 7: iteration 46490/ 115203 | consumed samples: 11901440 | consumed tokens: 24374149120 | elapsed time per iteration (s): 0.44 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.329733E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.342 | TFLOPs: 30.71 | 7: iteration 46500/ 115203 | consumed samples: 11904000 | consumed tokens: 24379392000 | elapsed time per iteration (s): 0.44 | learning rate: 1.385E-04 | global batch size: 256 | lm loss: 2.314034E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.410 | TFLOPs: 30.87 | 7: iteration 46510/ 115203 | consumed samples: 11906560 | consumed tokens: 24384634880 | elapsed time per iteration (s): 0.44 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.306589E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.920 | TFLOPs: 30.85 | 7: iteration 46520/ 115203 | consumed samples: 11909120 | consumed tokens: 24389877760 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.340587E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.113 | TFLOPs: 31.22 | 7: iteration 46530/ 115203 | consumed samples: 11911680 | consumed tokens: 24395120640 | elapsed time per iteration (s): 0.45 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.323909E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.102 | TFLOPs: 29.96 | 7: iteration 46540/ 115203 | consumed samples: 11914240 | consumed tokens: 24400363520 | elapsed time per iteration (s): 0.43 | learning rate: 1.384E-04 | global batch size: 256 | lm loss: 2.337260E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.905 | TFLOPs: 31.00 | 7: iteration 46550/ 115203 | consumed samples: 11916800 | consumed tokens: 24405606400 | elapsed time per iteration (s): 0.44 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.312770E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.767 | TFLOPs: 30.52 | 7: iteration 46560/ 115203 | consumed samples: 11919360 | consumed tokens: 24410849280 | elapsed time per iteration (s): 0.44 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.300908E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.456 | TFLOPs: 30.56 | 7: iteration 46570/ 115203 | consumed samples: 11921920 | consumed tokens: 24416092160 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.311148E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.452 | TFLOPs: 30.98 | 7: iteration 46580/ 115203 | consumed samples: 11924480 | consumed tokens: 24421335040 | elapsed time per iteration (s): 0.43 | learning rate: 1.383E-04 | global batch size: 256 | lm loss: 2.328126E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.742 | TFLOPs: 31.00 | 7: iteration 46590/ 115203 | consumed samples: 11927040 | consumed tokens: 24426577920 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.310335E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.277 | TFLOPs: 31.02 | 7: iteration 46600/ 115203 | consumed samples: 11929600 | consumed tokens: 24431820800 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.308993E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.153 | TFLOPs: 31.38 | 7: iteration 46610/ 115203 | consumed samples: 11932160 | consumed tokens: 24437063680 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.295420E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.024 | TFLOPs: 31.53 | 7: iteration 46620/ 115203 | consumed samples: 11934720 | consumed tokens: 24442306560 | elapsed time per iteration (s): 0.43 | learning rate: 1.382E-04 | global batch size: 256 | lm loss: 2.315463E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.955 | TFLOPs: 31.32 | 7: iteration 46630/ 115203 | consumed samples: 11937280 | consumed tokens: 24447549440 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.318068E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.013 | TFLOPs: 31.59 | 7: iteration 46640/ 115203 | consumed samples: 11939840 | consumed tokens: 24452792320 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.325227E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.083 | TFLOPs: 30.91 | 7: iteration 46650/ 115203 | consumed samples: 11942400 | consumed tokens: 24458035200 | elapsed time per iteration (s): 0.44 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.285307E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.211 | TFLOPs: 30.76 | 7: iteration 46660/ 115203 | consumed samples: 11944960 | consumed tokens: 24463278080 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.301512E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.098 | TFLOPs: 31.07 | 7: iteration 46670/ 115203 | consumed samples: 11947520 | consumed tokens: 24468520960 | elapsed time per iteration (s): 0.43 | learning rate: 1.381E-04 | global batch size: 256 | lm loss: 2.275004E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.384 | TFLOPs: 31.34 | 7: iteration 46680/ 115203 | consumed samples: 11950080 | consumed tokens: 24473763840 | elapsed time per iteration (s): 0.44 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.305935E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.115 | TFLOPs: 30.80 | 7: iteration 46690/ 115203 | consumed samples: 11952640 | consumed tokens: 24479006720 | elapsed time per iteration (s): 0.44 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.331436E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.659 | TFLOPs: 30.73 | 7: iteration 46700/ 115203 | consumed samples: 11955200 | consumed tokens: 24484249600 | elapsed time per iteration (s): 0.43 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.294864E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.340 | TFLOPs: 31.18 | 7: iteration 46710/ 115203 | consumed samples: 11957760 | consumed tokens: 24489492480 | elapsed time per iteration (s): 0.45 | learning rate: 1.380E-04 | global batch size: 256 | lm loss: 2.296220E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.445 | TFLOPs: 29.62 | 7: iteration 46720/ 115203 | consumed samples: 11960320 | consumed tokens: 24494735360 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.275902E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.881 | TFLOPs: 31.00 | 7: iteration 46730/ 115203 | consumed samples: 11962880 | consumed tokens: 24499978240 | elapsed time per iteration (s): 0.43 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.323755E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.993 | TFLOPs: 31.17 | 7: iteration 46740/ 115203 | consumed samples: 11965440 | consumed tokens: 24505221120 | elapsed time per iteration (s): 0.46 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.300414E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.837 | TFLOPs: 29.22 | 7: iteration 46750/ 115203 | consumed samples: 11968000 | consumed tokens: 24510464000 | elapsed time per iteration (s): 0.42 | learning rate: 1.379E-04 | global batch size: 256 | lm loss: 2.303555E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.483 | TFLOPs: 31.66 | 7: iteration 46760/ 115203 | consumed samples: 11970560 | consumed tokens: 24515706880 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.282677E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.132 | TFLOPs: 31.44 | 7: iteration 46770/ 115203 | consumed samples: 11973120 | consumed tokens: 24520949760 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.326993E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.196 | TFLOPs: 31.23 | 7: iteration 46780/ 115203 | consumed samples: 11975680 | consumed tokens: 24526192640 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.310138E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.583 | TFLOPs: 31.35 | 7: iteration 46790/ 115203 | consumed samples: 11978240 | consumed tokens: 24531435520 | elapsed time per iteration (s): 0.43 | learning rate: 1.378E-04 | global batch size: 256 | lm loss: 2.308918E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.995 | TFLOPs: 31.22 | 7: iteration 46800/ 115203 | consumed samples: 11980800 | consumed tokens: 24536678400 | elapsed time per iteration (s): 0.44 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.304241E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.170 | TFLOPs: 30.55 | 7: iteration 46810/ 115203 | consumed samples: 11983360 | consumed tokens: 24541921280 | elapsed time per iteration (s): 0.44 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.317210E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.112 | TFLOPs: 30.39 | 7: iteration 46820/ 115203 | consumed samples: 11985920 | consumed tokens: 24547164160 | elapsed time per iteration (s): 0.44 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.325331E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.289 | TFLOPs: 30.81 | 7: iteration 46830/ 115203 | consumed samples: 11988480 | consumed tokens: 24552407040 | elapsed time per iteration (s): 0.43 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.329109E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.926 | TFLOPs: 31.21 | 7: iteration 46840/ 115203 | consumed samples: 11991040 | consumed tokens: 24557649920 | elapsed time per iteration (s): 0.42 | learning rate: 1.377E-04 | global batch size: 256 | lm loss: 2.295303E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.756 | TFLOPs: 31.63 | 7: iteration 46850/ 115203 | consumed samples: 11993600 | consumed tokens: 24562892800 | elapsed time per iteration (s): 0.42 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.334082E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.107 | TFLOPs: 31.80 | 7: iteration 46860/ 115203 | consumed samples: 11996160 | consumed tokens: 24568135680 | elapsed time per iteration (s): 0.43 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.291874E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.786 | TFLOPs: 31.52 | 7: iteration 46870/ 115203 | consumed samples: 11998720 | consumed tokens: 24573378560 | elapsed time per iteration (s): 0.43 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.309524E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.556 | TFLOPs: 31.04 | 7: iteration 46880/ 115203 | consumed samples: 12001280 | consumed tokens: 24578621440 | elapsed time per iteration (s): 0.46 | learning rate: 1.376E-04 | global batch size: 256 | lm loss: 2.331983E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.709 | TFLOPs: 29.26 | 7: iteration 46890/ 115203 | consumed samples: 12003840 | consumed tokens: 24583864320 | elapsed time per iteration (s): 0.43 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.296527E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.169 | TFLOPs: 31.44 | 7: iteration 46900/ 115203 | consumed samples: 12006400 | consumed tokens: 24589107200 | elapsed time per iteration (s): 0.44 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.315559E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.064 | TFLOPs: 30.54 | 7: iteration 46910/ 115203 | consumed samples: 12008960 | consumed tokens: 24594350080 | elapsed time per iteration (s): 0.44 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.327705E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.991 | TFLOPs: 30.64 | 7: iteration 46920/ 115203 | consumed samples: 12011520 | consumed tokens: 24599592960 | elapsed time per iteration (s): 0.44 | learning rate: 1.375E-04 | global batch size: 256 | lm loss: 2.350561E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.673 | TFLOPs: 30.73 | 7: iteration 46930/ 115203 | consumed samples: 12014080 | consumed tokens: 24604835840 | elapsed time per iteration (s): 0.44 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.302408E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.079 | TFLOPs: 30.86 | 7: iteration 46940/ 115203 | consumed samples: 12016640 | consumed tokens: 24610078720 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.330251E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.098 | TFLOPs: 31.43 | 7: iteration 46950/ 115203 | consumed samples: 12019200 | consumed tokens: 24615321600 | elapsed time per iteration (s): 0.44 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.305885E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.265 | TFLOPs: 30.81 | 7: iteration 46960/ 115203 | consumed samples: 12021760 | consumed tokens: 24620564480 | elapsed time per iteration (s): 0.43 | learning rate: 1.374E-04 | global batch size: 256 | lm loss: 2.300159E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.532 | TFLOPs: 31.30 | 7: iteration 46970/ 115203 | consumed samples: 12024320 | consumed tokens: 24625807360 | elapsed time per iteration (s): 0.43 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.327324E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.388 | TFLOPs: 31.50 | 7: iteration 46980/ 115203 | consumed samples: 12026880 | consumed tokens: 24631050240 | elapsed time per iteration (s): 0.44 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.341141E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.161 | TFLOPs: 30.81 | 7: iteration 46990/ 115203 | consumed samples: 12029440 | consumed tokens: 24636293120 | elapsed time per iteration (s): 0.43 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.316968E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.453 | TFLOPs: 30.98 | 7: iteration 47000/ 115203 | consumed samples: 12032000 | consumed tokens: 24641536000 | elapsed time per iteration (s): 0.44 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.322353E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.900 | TFLOPs: 30.74 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 47000 | lm loss value: 2.137343E+00 | lm loss PPL: 8.476884E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 47000 to checkpoints_221m 0: [2022-11-28 18:36:09,885] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step47000 is begin to save! 0: [2022-11-28 18:36:09,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:36:10,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:36:10,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:36:10,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:36:10,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:36:10,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:36:10,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:36:10,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:36:10,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:36:10,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:36:10,098] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:36:10,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:36:10,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:36:10,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:36:10,145] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:36:10,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:36:10,170] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:36:10,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:36:10,193] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:36:10,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:36:10,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:36:10,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:36:10,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:36:10,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:36:10,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:36:10,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:36:10,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:36:10,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:36:10,312] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:36:10,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:36:10,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:36:10,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:36:10,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:36:10,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:36:10,384] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:36:10,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:36:10,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:36:10,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:36:10,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:36:10,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:36:10,437] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step47000/mp_rank_00_model_states.pt 0: [2022-11-28 18:36:10,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:36:10,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:36:10,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,521] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,521] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,528] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,528] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:36:10,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 2: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:36:10,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,550] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 1: [2022-11-28 18:36:10,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:36:10,553] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:36:10,553] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:36:10,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 7: [2022-11-28 18:36:10,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 5: [2022-11-28 18:36:10,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:36:10,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 18:36:10,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:36:10,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:36:10,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 4: [2022-11-28 18:36:10,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,571] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 3: [2022-11-28 18:36:10,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:36:10,572] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:36:10,572] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: [2022-11-28 18:36:10,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:36:10,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:36:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step47000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:36:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 6: [2022-11-28 18:36:10,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step47000 is ready now! 0: successfully saved checkpoint at iteration 47000 to checkpoints_221m 7: time (ms) | save-checkpoint: 852.94 7: iteration 47010/ 115203 | consumed samples: 12034560 | consumed tokens: 24646778880 | elapsed time per iteration (s): 0.53 | learning rate: 1.373E-04 | global batch size: 256 | lm loss: 2.306063E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 482.345 | TFLOPs: 25.31 | 7: iteration 47020/ 115203 | consumed samples: 12037120 | consumed tokens: 24652021760 | elapsed time per iteration (s): 0.45 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.318670E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.920 | TFLOPs: 29.96 | 7: iteration 47030/ 115203 | consumed samples: 12039680 | consumed tokens: 24657264640 | elapsed time per iteration (s): 0.45 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.295753E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.863 | TFLOPs: 30.06 | 7: iteration 47040/ 115203 | consumed samples: 12042240 | consumed tokens: 24662507520 | elapsed time per iteration (s): 0.44 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.328481E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.306 | TFLOPs: 30.76 | 7: iteration 47050/ 115203 | consumed samples: 12044800 | consumed tokens: 24667750400 | elapsed time per iteration (s): 0.44 | learning rate: 1.372E-04 | global batch size: 256 | lm loss: 2.316310E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.851 | TFLOPs: 30.21 | 7: iteration 47060/ 115203 | consumed samples: 12047360 | consumed tokens: 24672993280 | elapsed time per iteration (s): 0.43 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.317985E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.424 | TFLOPs: 31.08 | 7: iteration 47070/ 115203 | consumed samples: 12049920 | consumed tokens: 24678236160 | elapsed time per iteration (s): 0.44 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.306228E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.888 | TFLOPs: 30.74 | 7: iteration 47080/ 115203 | consumed samples: 12052480 | consumed tokens: 24683479040 | elapsed time per iteration (s): 0.43 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.317061E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.805 | TFLOPs: 31.37 | 7: iteration 47090/ 115203 | consumed samples: 12055040 | consumed tokens: 24688721920 | elapsed time per iteration (s): 0.45 | learning rate: 1.371E-04 | global batch size: 256 | lm loss: 2.280938E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.746 | TFLOPs: 30.10 | 7: iteration 47100/ 115203 | consumed samples: 12057600 | consumed tokens: 24693964800 | elapsed time per iteration (s): 0.43 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.296021E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.850 | TFLOPs: 31.26 | 7: iteration 47110/ 115203 | consumed samples: 12060160 | consumed tokens: 24699207680 | elapsed time per iteration (s): 0.45 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.310242E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.470 | TFLOPs: 30.09 | 7: iteration 47120/ 115203 | consumed samples: 12062720 | consumed tokens: 24704450560 | elapsed time per iteration (s): 0.43 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.314112E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.913 | TFLOPs: 31.11 | 7: iteration 47130/ 115203 | consumed samples: 12065280 | consumed tokens: 24709693440 | elapsed time per iteration (s): 0.44 | learning rate: 1.370E-04 | global batch size: 256 | lm loss: 2.306295E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.417 | TFLOPs: 30.56 | 7: iteration 47140/ 115203 | consumed samples: 12067840 | consumed tokens: 24714936320 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.294680E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.417 | TFLOPs: 31.14 | 7: iteration 47150/ 115203 | consumed samples: 12070400 | consumed tokens: 24720179200 | elapsed time per iteration (s): 0.44 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.295729E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.618 | TFLOPs: 30.25 | 7: iteration 47160/ 115203 | consumed samples: 12072960 | consumed tokens: 24725422080 | elapsed time per iteration (s): 0.43 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.337254E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.404 | TFLOPs: 31.24 | 7: iteration 47170/ 115203 | consumed samples: 12075520 | consumed tokens: 24730664960 | elapsed time per iteration (s): 0.44 | learning rate: 1.369E-04 | global batch size: 256 | lm loss: 2.310365E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.158 | TFLOPs: 30.86 | 7: iteration 47180/ 115203 | consumed samples: 12078080 | consumed tokens: 24735907840 | elapsed time per iteration (s): 0.43 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.331561E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.743 | TFLOPs: 31.10 | 7: iteration 47190/ 115203 | consumed samples: 12080640 | consumed tokens: 24741150720 | elapsed time per iteration (s): 0.43 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.312847E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.258 | TFLOPs: 30.97 | 7: iteration 47200/ 115203 | consumed samples: 12083200 | consumed tokens: 24746393600 | elapsed time per iteration (s): 0.43 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.320560E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.008 | TFLOPs: 31.22 | 7: iteration 47210/ 115203 | consumed samples: 12085760 | consumed tokens: 24751636480 | elapsed time per iteration (s): 0.42 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.344269E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.651 | TFLOPs: 31.78 | 7: iteration 47220/ 115203 | consumed samples: 12088320 | consumed tokens: 24756879360 | elapsed time per iteration (s): 0.43 | learning rate: 1.368E-04 | global batch size: 256 | lm loss: 2.293746E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.628 | TFLOPs: 30.99 | 7: iteration 47230/ 115203 | consumed samples: 12090880 | consumed tokens: 24762122240 | elapsed time per iteration (s): 0.42 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.290269E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.107 | TFLOPs: 31.70 | 7: iteration 47240/ 115203 | consumed samples: 12093440 | consumed tokens: 24767365120 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.273591E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.136 | TFLOPs: 30.96 | 7: iteration 47250/ 115203 | consumed samples: 12096000 | consumed tokens: 24772608000 | elapsed time per iteration (s): 0.43 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.309544E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.600 | TFLOPs: 31.25 | 7: iteration 47260/ 115203 | consumed samples: 12098560 | consumed tokens: 24777850880 | elapsed time per iteration (s): 0.42 | learning rate: 1.367E-04 | global batch size: 256 | lm loss: 2.304831E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.459 | TFLOPs: 31.71 | 7: iteration 47270/ 115203 | consumed samples: 12101120 | consumed tokens: 24783093760 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.269427E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.066 | TFLOPs: 31.17 | 7: iteration 47280/ 115203 | consumed samples: 12103680 | consumed tokens: 24788336640 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.316797E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.072 | TFLOPs: 31.17 | 7: iteration 47290/ 115203 | consumed samples: 12106240 | consumed tokens: 24793579520 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.312135E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.788 | TFLOPs: 31.16 | 7: iteration 47300/ 115203 | consumed samples: 12108800 | consumed tokens: 24798822400 | elapsed time per iteration (s): 0.43 | learning rate: 1.366E-04 | global batch size: 256 | lm loss: 2.270558E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.620 | TFLOPs: 31.30 | 7: iteration 47310/ 115203 | consumed samples: 12111360 | consumed tokens: 24804065280 | elapsed time per iteration (s): 0.44 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.341672E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.853 | TFLOPs: 30.84 | 7: iteration 47320/ 115203 | consumed samples: 12113920 | consumed tokens: 24809308160 | elapsed time per iteration (s): 0.42 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.302613E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.449 | TFLOPs: 31.98 | 7: iteration 47330/ 115203 | consumed samples: 12116480 | consumed tokens: 24814551040 | elapsed time per iteration (s): 0.43 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.336885E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.772 | TFLOPs: 31.05 | 7: iteration 47340/ 115203 | consumed samples: 12119040 | consumed tokens: 24819793920 | elapsed time per iteration (s): 0.43 | learning rate: 1.365E-04 | global batch size: 256 | lm loss: 2.303257E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.873 | TFLOPs: 31.32 | 7: iteration 47350/ 115203 | consumed samples: 12121600 | consumed tokens: 24825036800 | elapsed time per iteration (s): 0.43 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.306240E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.253 | TFLOPs: 31.39 | 7: iteration 47360/ 115203 | consumed samples: 12124160 | consumed tokens: 24830279680 | elapsed time per iteration (s): 0.43 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.326095E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.360 | TFLOPs: 31.55 | 7: iteration 47370/ 115203 | consumed samples: 12126720 | consumed tokens: 24835522560 | elapsed time per iteration (s): 0.43 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.275312E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.782 | TFLOPs: 31.00 | 7: iteration 47380/ 115203 | consumed samples: 12129280 | consumed tokens: 24840765440 | elapsed time per iteration (s): 0.42 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.292835E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.185 | TFLOPs: 31.65 | 7: iteration 47390/ 115203 | consumed samples: 12131840 | consumed tokens: 24846008320 | elapsed time per iteration (s): 0.44 | learning rate: 1.364E-04 | global batch size: 256 | lm loss: 2.319638E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.535 | TFLOPs: 30.41 | 7: iteration 47400/ 115203 | consumed samples: 12134400 | consumed tokens: 24851251200 | elapsed time per iteration (s): 0.42 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.306528E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.503 | TFLOPs: 31.72 | 7: iteration 47410/ 115203 | consumed samples: 12136960 | consumed tokens: 24856494080 | elapsed time per iteration (s): 0.44 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.308305E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.274 | TFLOPs: 30.24 | 7: iteration 47420/ 115203 | consumed samples: 12139520 | consumed tokens: 24861736960 | elapsed time per iteration (s): 0.44 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.312078E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.978 | TFLOPs: 30.48 | 7: iteration 47430/ 115203 | consumed samples: 12142080 | consumed tokens: 24866979840 | elapsed time per iteration (s): 0.43 | learning rate: 1.363E-04 | global batch size: 256 | lm loss: 2.316105E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.588 | TFLOPs: 31.20 | 7: iteration 47440/ 115203 | consumed samples: 12144640 | consumed tokens: 24872222720 | elapsed time per iteration (s): 0.44 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.287397E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.896 | TFLOPs: 30.74 | 7: iteration 47450/ 115203 | consumed samples: 12147200 | consumed tokens: 24877465600 | elapsed time per iteration (s): 0.43 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.287716E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.254 | TFLOPs: 31.18 | 7: iteration 47460/ 115203 | consumed samples: 12149760 | consumed tokens: 24882708480 | elapsed time per iteration (s): 0.43 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.316164E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.338 | TFLOPs: 31.18 | 7: iteration 47470/ 115203 | consumed samples: 12152320 | consumed tokens: 24887951360 | elapsed time per iteration (s): 0.43 | learning rate: 1.362E-04 | global batch size: 256 | lm loss: 2.336915E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.392 | TFLOPs: 31.19 | 7: iteration 47480/ 115203 | consumed samples: 12154880 | consumed tokens: 24893194240 | elapsed time per iteration (s): 0.44 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.308425E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.188 | TFLOPs: 30.55 | 7: iteration 47490/ 115203 | consumed samples: 12157440 | consumed tokens: 24898437120 | elapsed time per iteration (s): 0.42 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.299158E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.804 | TFLOPs: 31.89 | 7: iteration 47500/ 115203 | consumed samples: 12160000 | consumed tokens: 24903680000 | elapsed time per iteration (s): 0.45 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.309009E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.010 | TFLOPs: 30.06 | 7: iteration 47510/ 115203 | consumed samples: 12162560 | consumed tokens: 24908922880 | elapsed time per iteration (s): 0.44 | learning rate: 1.361E-04 | global batch size: 256 | lm loss: 2.298582E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.422 | TFLOPs: 30.77 | 7: iteration 47520/ 115203 | consumed samples: 12165120 | consumed tokens: 24914165760 | elapsed time per iteration (s): 0.43 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.299116E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.447 | TFLOPs: 31.29 | 7: iteration 47530/ 115203 | consumed samples: 12167680 | consumed tokens: 24919408640 | elapsed time per iteration (s): 0.43 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.304680E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.392 | TFLOPs: 31.08 | 7: iteration 47540/ 115203 | consumed samples: 12170240 | consumed tokens: 24924651520 | elapsed time per iteration (s): 0.43 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.325980E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.914 | TFLOPs: 31.00 | 7: iteration 47550/ 115203 | consumed samples: 12172800 | consumed tokens: 24929894400 | elapsed time per iteration (s): 0.44 | learning rate: 1.360E-04 | global batch size: 256 | lm loss: 2.310987E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.331 | TFLOPs: 30.76 | 7: iteration 47560/ 115203 | consumed samples: 12175360 | consumed tokens: 24935137280 | elapsed time per iteration (s): 0.43 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.328151E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.288 | TFLOPs: 31.34 | 7: iteration 47570/ 115203 | consumed samples: 12177920 | consumed tokens: 24940380160 | elapsed time per iteration (s): 0.44 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.303263E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.794 | TFLOPs: 30.74 | 7: iteration 47580/ 115203 | consumed samples: 12180480 | consumed tokens: 24945623040 | elapsed time per iteration (s): 0.43 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.298839E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.418 | TFLOPs: 31.40 | 7: iteration 47590/ 115203 | consumed samples: 12183040 | consumed tokens: 24950865920 | elapsed time per iteration (s): 0.43 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.330993E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.032 | TFLOPs: 31.59 | 7: iteration 47600/ 115203 | consumed samples: 12185600 | consumed tokens: 24956108800 | elapsed time per iteration (s): 0.44 | learning rate: 1.359E-04 | global batch size: 256 | lm loss: 2.293665E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.875 | TFLOPs: 30.48 | 7: iteration 47610/ 115203 | consumed samples: 12188160 | consumed tokens: 24961351680 | elapsed time per iteration (s): 0.44 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.302010E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.193 | TFLOPs: 30.86 | 7: iteration 47620/ 115203 | consumed samples: 12190720 | consumed tokens: 24966594560 | elapsed time per iteration (s): 0.43 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.295861E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.641 | TFLOPs: 31.20 | 7: iteration 47630/ 115203 | consumed samples: 12193280 | consumed tokens: 24971837440 | elapsed time per iteration (s): 0.43 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.303437E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.528 | TFLOPs: 31.25 | 7: iteration 47640/ 115203 | consumed samples: 12195840 | consumed tokens: 24977080320 | elapsed time per iteration (s): 0.44 | learning rate: 1.358E-04 | global batch size: 256 | lm loss: 2.295347E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.024 | TFLOPs: 30.38 | 7: iteration 47650/ 115203 | consumed samples: 12198400 | consumed tokens: 24982323200 | elapsed time per iteration (s): 0.44 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.281685E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.945 | TFLOPs: 30.85 | 7: iteration 47660/ 115203 | consumed samples: 12200960 | consumed tokens: 24987566080 | elapsed time per iteration (s): 0.43 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.330071E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.134 | TFLOPs: 31.12 | 7: iteration 47670/ 115203 | consumed samples: 12203520 | consumed tokens: 24992808960 | elapsed time per iteration (s): 0.44 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.350550E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.524 | TFLOPs: 30.67 | 7: iteration 47680/ 115203 | consumed samples: 12206080 | consumed tokens: 24998051840 | elapsed time per iteration (s): 0.43 | learning rate: 1.357E-04 | global batch size: 256 | lm loss: 2.295704E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.882 | TFLOPs: 31.37 | 7: iteration 47690/ 115203 | consumed samples: 12208640 | consumed tokens: 25003294720 | elapsed time per iteration (s): 0.44 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.303972E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.707 | TFLOPs: 30.36 | 7: iteration 47700/ 115203 | consumed samples: 12211200 | consumed tokens: 25008537600 | elapsed time per iteration (s): 0.43 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.306244E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.402 | TFLOPs: 30.92 | 7: iteration 47710/ 115203 | consumed samples: 12213760 | consumed tokens: 25013780480 | elapsed time per iteration (s): 0.42 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.296851E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.536 | TFLOPs: 31.98 | 7: iteration 47720/ 115203 | consumed samples: 12216320 | consumed tokens: 25019023360 | elapsed time per iteration (s): 0.43 | learning rate: 1.356E-04 | global batch size: 256 | lm loss: 2.318909E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.859 | TFLOPs: 31.21 | 7: iteration 47730/ 115203 | consumed samples: 12218880 | consumed tokens: 25024266240 | elapsed time per iteration (s): 0.44 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.278719E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.199 | TFLOPs: 30.55 | 7: iteration 47740/ 115203 | consumed samples: 12221440 | consumed tokens: 25029509120 | elapsed time per iteration (s): 0.44 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.279998E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.371 | TFLOPs: 30.77 | 7: iteration 47750/ 115203 | consumed samples: 12224000 | consumed tokens: 25034752000 | elapsed time per iteration (s): 0.44 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.286634E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.040 | TFLOPs: 30.59 | 7: iteration 47760/ 115203 | consumed samples: 12226560 | consumed tokens: 25039994880 | elapsed time per iteration (s): 0.43 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.276587E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.928 | TFLOPs: 31.16 | 7: iteration 47770/ 115203 | consumed samples: 12229120 | consumed tokens: 25045237760 | elapsed time per iteration (s): 0.43 | learning rate: 1.355E-04 | global batch size: 256 | lm loss: 2.317167E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.668 | TFLOPs: 31.04 | 7: iteration 47780/ 115203 | consumed samples: 12231680 | consumed tokens: 25050480640 | elapsed time per iteration (s): 0.46 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.305195E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.656 | TFLOPs: 29.42 | 7: iteration 47790/ 115203 | consumed samples: 12234240 | consumed tokens: 25055723520 | elapsed time per iteration (s): 0.43 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.308879E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.416 | TFLOPs: 31.19 | 7: iteration 47800/ 115203 | consumed samples: 12236800 | consumed tokens: 25060966400 | elapsed time per iteration (s): 0.42 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.326229E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.598 | TFLOPs: 31.62 | 7: iteration 47810/ 115203 | consumed samples: 12239360 | consumed tokens: 25066209280 | elapsed time per iteration (s): 0.45 | learning rate: 1.354E-04 | global batch size: 256 | lm loss: 2.288612E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.763 | TFLOPs: 29.95 | 7: iteration 47820/ 115203 | consumed samples: 12241920 | consumed tokens: 25071452160 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.290705E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.027 | TFLOPs: 30.91 | 7: iteration 47830/ 115203 | consumed samples: 12244480 | consumed tokens: 25076695040 | elapsed time per iteration (s): 0.43 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.321613E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.114 | TFLOPs: 31.38 | 7: iteration 47840/ 115203 | consumed samples: 12247040 | consumed tokens: 25081937920 | elapsed time per iteration (s): 0.44 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.308127E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.441 | TFLOPs: 30.72 | 7: iteration 47850/ 115203 | consumed samples: 12249600 | consumed tokens: 25087180800 | elapsed time per iteration (s): 0.42 | learning rate: 1.353E-04 | global batch size: 256 | lm loss: 2.313544E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.916 | TFLOPs: 31.63 | 7: iteration 47860/ 115203 | consumed samples: 12252160 | consumed tokens: 25092423680 | elapsed time per iteration (s): 0.42 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.287402E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.119 | TFLOPs: 31.64 | 7: iteration 47870/ 115203 | consumed samples: 12254720 | consumed tokens: 25097666560 | elapsed time per iteration (s): 0.44 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.319492E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.818 | TFLOPs: 30.63 | 7: iteration 47880/ 115203 | consumed samples: 12257280 | consumed tokens: 25102909440 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.315651E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.392 | TFLOPs: 31.24 | 7: iteration 47890/ 115203 | consumed samples: 12259840 | consumed tokens: 25108152320 | elapsed time per iteration (s): 0.43 | learning rate: 1.352E-04 | global batch size: 256 | lm loss: 2.313100E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.817 | TFLOPs: 31.42 | 7: iteration 47900/ 115203 | consumed samples: 12262400 | consumed tokens: 25113395200 | elapsed time per iteration (s): 0.44 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.293600E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.150 | TFLOPs: 30.70 | 7: iteration 47910/ 115203 | consumed samples: 12264960 | consumed tokens: 25118638080 | elapsed time per iteration (s): 0.43 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.327037E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.344 | TFLOPs: 30.97 | 7: iteration 47920/ 115203 | consumed samples: 12267520 | consumed tokens: 25123880960 | elapsed time per iteration (s): 0.44 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.296029E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.346 | TFLOPs: 30.87 | 7: iteration 47930/ 115203 | consumed samples: 12270080 | consumed tokens: 25129123840 | elapsed time per iteration (s): 0.44 | learning rate: 1.351E-04 | global batch size: 256 | lm loss: 2.305995E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.369 | TFLOPs: 30.40 | 7: iteration 47940/ 115203 | consumed samples: 12272640 | consumed tokens: 25134366720 | elapsed time per iteration (s): 0.44 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.325185E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.325 | TFLOPs: 30.34 | 7: iteration 47950/ 115203 | consumed samples: 12275200 | consumed tokens: 25139609600 | elapsed time per iteration (s): 0.43 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.307550E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.865 | TFLOPs: 31.32 | 7: iteration 47960/ 115203 | consumed samples: 12277760 | consumed tokens: 25144852480 | elapsed time per iteration (s): 0.43 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.297512E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.567 | TFLOPs: 31.41 | 7: iteration 47970/ 115203 | consumed samples: 12280320 | consumed tokens: 25150095360 | elapsed time per iteration (s): 0.44 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.318054E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.713 | TFLOPs: 30.78 | 7: iteration 47980/ 115203 | consumed samples: 12282880 | consumed tokens: 25155338240 | elapsed time per iteration (s): 0.43 | learning rate: 1.350E-04 | global batch size: 256 | lm loss: 2.325151E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.406 | TFLOPs: 31.24 | 7: iteration 47990/ 115203 | consumed samples: 12285440 | consumed tokens: 25160581120 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.316137E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.983 | TFLOPs: 31.22 | 0: [2022-11-28 18:43:24,353] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=0, lr=[0.00013490269160287214, 0.00013490269160287214, 0.00013490269160287214], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 48000/ 115203 | consumed samples: 12288000 | consumed tokens: 25165824000 | elapsed time per iteration (s): 0.44 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.296914E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.342 | TFLOPs: 30.19 | 0: steps: 48000 loss: 2.3195 iter time (s): 0.433 samples/sec: 590.799 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 48000 | lm loss value: 2.301651E+00 | lm loss PPL: 9.990659E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 48000 to checkpoints_221m 0: [2022-11-28 18:43:24,535] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step48000 is begin to save! 0: [2022-11-28 18:43:24,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:43:24,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:43:24,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:43:24,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:43:24,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:43:24,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:43:24,714] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:43:24,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:43:24,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:43:24,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:43:24,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:43:24,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:43:24,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:43:24,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:43:24,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:43:24,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:43:24,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:43:24,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:43:24,856] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:43:24,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:43:24,879] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:43:24,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:43:24,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:43:24,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:43:24,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:43:24,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:43:24,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:43:24,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:43:24,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:43:25,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:43:25,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:43:25,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:43:25,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:43:25,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:43:25,049] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:43:25,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:43:25,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:43:25,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:43:25,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:43:25,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:43:25,100] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step48000/mp_rank_00_model_states.pt 0: [2022-11-28 18:43:25,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:43:25,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:43:25,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:43:25,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,171] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,171] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,172] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,172] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,172] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,178] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,178] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,180] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,180] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,181] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,181] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,181] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 4: [2022-11-28 18:43:25,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:43:25,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:43:25,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,176] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,176] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,177] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,177] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 1: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 7: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 5: [2022-11-28 18:43:25,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:43:25,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 18:43:25,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:43:25,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:43:25,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 6: [2022-11-28 18:43:25,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: [2022-11-28 18:43:25,229] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:43:25,229] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step48000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 3: [2022-11-28 18:43:25,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step48000 is ready now! 0: successfully saved checkpoint at iteration 48000 to checkpoints_221m 7: time (ms) | save-checkpoint: 720.89 7: iteration 48010/ 115203 | consumed samples: 12290560 | consumed tokens: 25171066880 | elapsed time per iteration (s): 0.52 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.306033E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 493.083 | TFLOPs: 25.87 | 7: iteration 48020/ 115203 | consumed samples: 12293120 | consumed tokens: 25176309760 | elapsed time per iteration (s): 0.43 | learning rate: 1.349E-04 | global batch size: 256 | lm loss: 2.354610E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.298 | TFLOPs: 31.02 | 7: iteration 48030/ 115203 | consumed samples: 12295680 | consumed tokens: 25181552640 | elapsed time per iteration (s): 0.43 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.308397E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.456 | TFLOPs: 31.14 | 7: iteration 48040/ 115203 | consumed samples: 12298240 | consumed tokens: 25186795520 | elapsed time per iteration (s): 0.43 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.311879E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.662 | TFLOPs: 31.52 | 7: iteration 48050/ 115203 | consumed samples: 12300800 | consumed tokens: 25192038400 | elapsed time per iteration (s): 0.44 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.294094E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.975 | TFLOPs: 30.69 | 7: iteration 48060/ 115203 | consumed samples: 12303360 | consumed tokens: 25197281280 | elapsed time per iteration (s): 0.43 | learning rate: 1.348E-04 | global batch size: 256 | lm loss: 2.336531E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.141 | TFLOPs: 31.38 | 7: iteration 48070/ 115203 | consumed samples: 12305920 | consumed tokens: 25202524160 | elapsed time per iteration (s): 0.45 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.289821E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.473 | TFLOPs: 29.88 | 7: iteration 48080/ 115203 | consumed samples: 12308480 | consumed tokens: 25207767040 | elapsed time per iteration (s): 0.44 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.322791E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.778 | TFLOPs: 30.52 | 7: iteration 48090/ 115203 | consumed samples: 12311040 | consumed tokens: 25213009920 | elapsed time per iteration (s): 0.44 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.294297E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.139 | TFLOPs: 30.44 | 7: iteration 48100/ 115203 | consumed samples: 12313600 | consumed tokens: 25218252800 | elapsed time per iteration (s): 0.43 | learning rate: 1.347E-04 | global batch size: 256 | lm loss: 2.298528E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.595 | TFLOPs: 31.04 | 7: iteration 48110/ 115203 | consumed samples: 12316160 | consumed tokens: 25223495680 | elapsed time per iteration (s): 0.43 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.297011E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.411 | TFLOPs: 31.35 | 7: iteration 48120/ 115203 | consumed samples: 12318720 | consumed tokens: 25228738560 | elapsed time per iteration (s): 0.43 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.282899E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.301 | TFLOPs: 31.08 | 7: iteration 48130/ 115203 | consumed samples: 12321280 | consumed tokens: 25233981440 | elapsed time per iteration (s): 0.43 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.298983E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.370 | TFLOPs: 30.92 | 7: iteration 48140/ 115203 | consumed samples: 12323840 | consumed tokens: 25239224320 | elapsed time per iteration (s): 0.44 | learning rate: 1.346E-04 | global batch size: 256 | lm loss: 2.309837E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.317 | TFLOPs: 30.71 | 7: iteration 48150/ 115203 | consumed samples: 12326400 | consumed tokens: 25244467200 | elapsed time per iteration (s): 0.44 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.298488E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.059 | TFLOPs: 30.64 | 7: iteration 48160/ 115203 | consumed samples: 12328960 | consumed tokens: 25249710080 | elapsed time per iteration (s): 0.44 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.311055E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.363 | TFLOPs: 30.45 | 7: iteration 48170/ 115203 | consumed samples: 12331520 | consumed tokens: 25254952960 | elapsed time per iteration (s): 0.44 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.338104E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.607 | TFLOPs: 30.83 | 7: iteration 48180/ 115203 | consumed samples: 12334080 | consumed tokens: 25260195840 | elapsed time per iteration (s): 0.44 | learning rate: 1.345E-04 | global batch size: 256 | lm loss: 2.285484E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.074 | TFLOPs: 30.38 | 7: iteration 48190/ 115203 | consumed samples: 12336640 | consumed tokens: 25265438720 | elapsed time per iteration (s): 0.44 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.313661E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.706 | TFLOPs: 30.68 | 7: iteration 48200/ 115203 | consumed samples: 12339200 | consumed tokens: 25270681600 | elapsed time per iteration (s): 0.43 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.289632E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.819 | TFLOPs: 31.21 | 7: iteration 48210/ 115203 | consumed samples: 12341760 | consumed tokens: 25275924480 | elapsed time per iteration (s): 0.44 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.292523E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.772 | TFLOPs: 30.31 | 7: iteration 48220/ 115203 | consumed samples: 12344320 | consumed tokens: 25281167360 | elapsed time per iteration (s): 0.44 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.302205E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.581 | TFLOPs: 30.62 | 7: iteration 48230/ 115203 | consumed samples: 12346880 | consumed tokens: 25286410240 | elapsed time per iteration (s): 0.43 | learning rate: 1.344E-04 | global batch size: 256 | lm loss: 2.288584E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.680 | TFLOPs: 31.15 | 7: iteration 48240/ 115203 | consumed samples: 12349440 | consumed tokens: 25291653120 | elapsed time per iteration (s): 0.43 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.291136E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.255 | TFLOPs: 31.02 | 7: iteration 48250/ 115203 | consumed samples: 12352000 | consumed tokens: 25296896000 | elapsed time per iteration (s): 0.43 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.317618E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.258 | TFLOPs: 31.34 | 7: iteration 48260/ 115203 | consumed samples: 12354560 | consumed tokens: 25302138880 | elapsed time per iteration (s): 0.46 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.253422E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.785 | TFLOPs: 29.00 | 7: iteration 48270/ 115203 | consumed samples: 12357120 | consumed tokens: 25307381760 | elapsed time per iteration (s): 0.43 | learning rate: 1.343E-04 | global batch size: 256 | lm loss: 2.331198E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.016 | TFLOPs: 31.11 | 7: iteration 48280/ 115203 | consumed samples: 12359680 | consumed tokens: 25312624640 | elapsed time per iteration (s): 0.45 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.319169E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.439 | TFLOPs: 30.09 | 7: iteration 48290/ 115203 | consumed samples: 12362240 | consumed tokens: 25317867520 | elapsed time per iteration (s): 0.44 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.317801E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.494 | TFLOPs: 30.77 | 7: iteration 48300/ 115203 | consumed samples: 12364800 | consumed tokens: 25323110400 | elapsed time per iteration (s): 0.43 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.319867E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.929 | TFLOPs: 31.42 | 7: iteration 48310/ 115203 | consumed samples: 12367360 | consumed tokens: 25328353280 | elapsed time per iteration (s): 0.43 | learning rate: 1.342E-04 | global batch size: 256 | lm loss: 2.339824E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.695 | TFLOPs: 31.26 | 7: iteration 48320/ 115203 | consumed samples: 12369920 | consumed tokens: 25333596160 | elapsed time per iteration (s): 0.43 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.301506E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.873 | TFLOPs: 31.00 | 7: iteration 48330/ 115203 | consumed samples: 12372480 | consumed tokens: 25338839040 | elapsed time per iteration (s): 0.43 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.296263E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.648 | TFLOPs: 31.31 | 7: iteration 48340/ 115203 | consumed samples: 12375040 | consumed tokens: 25344081920 | elapsed time per iteration (s): 0.44 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.291565E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.869 | TFLOPs: 30.74 | 7: iteration 48350/ 115203 | consumed samples: 12377600 | consumed tokens: 25349324800 | elapsed time per iteration (s): 0.42 | learning rate: 1.341E-04 | global batch size: 256 | lm loss: 2.294462E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.298 | TFLOPs: 31.65 | 7: iteration 48360/ 115203 | consumed samples: 12380160 | consumed tokens: 25354567680 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.306203E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.799 | TFLOPs: 31.10 | 7: iteration 48370/ 115203 | consumed samples: 12382720 | consumed tokens: 25359810560 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.326185E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.596 | TFLOPs: 30.94 | 7: iteration 48380/ 115203 | consumed samples: 12385280 | consumed tokens: 25365053440 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.308120E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.773 | TFLOPs: 31.21 | 7: iteration 48390/ 115203 | consumed samples: 12387840 | consumed tokens: 25370296320 | elapsed time per iteration (s): 0.43 | learning rate: 1.340E-04 | global batch size: 256 | lm loss: 2.288488E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.796 | TFLOPs: 31.05 | 7: iteration 48400/ 115203 | consumed samples: 12390400 | consumed tokens: 25375539200 | elapsed time per iteration (s): 0.43 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.304250E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.567 | TFLOPs: 31.25 | 7: iteration 48410/ 115203 | consumed samples: 12392960 | consumed tokens: 25380782080 | elapsed time per iteration (s): 0.43 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.304408E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.548 | TFLOPs: 31.51 | 7: iteration 48420/ 115203 | consumed samples: 12395520 | consumed tokens: 25386024960 | elapsed time per iteration (s): 0.43 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.297301E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.462 | TFLOPs: 31.40 | 7: iteration 48430/ 115203 | consumed samples: 12398080 | consumed tokens: 25391267840 | elapsed time per iteration (s): 0.44 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.326490E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.552 | TFLOPs: 30.83 | 7: iteration 48440/ 115203 | consumed samples: 12400640 | consumed tokens: 25396510720 | elapsed time per iteration (s): 0.44 | learning rate: 1.339E-04 | global batch size: 256 | lm loss: 2.292171E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.516 | TFLOPs: 30.67 | 7: iteration 48450/ 115203 | consumed samples: 12403200 | consumed tokens: 25401753600 | elapsed time per iteration (s): 0.43 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.279217E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.818 | TFLOPs: 30.89 | 7: iteration 48460/ 115203 | consumed samples: 12405760 | consumed tokens: 25406996480 | elapsed time per iteration (s): 0.43 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.328664E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.214 | TFLOPs: 31.18 | 7: iteration 48470/ 115203 | consumed samples: 12408320 | consumed tokens: 25412239360 | elapsed time per iteration (s): 0.43 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.289071E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.017 | TFLOPs: 31.32 | 7: iteration 48480/ 115203 | consumed samples: 12410880 | consumed tokens: 25417482240 | elapsed time per iteration (s): 0.44 | learning rate: 1.338E-04 | global batch size: 256 | lm loss: 2.289444E+00 | grad norm: 0.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.460 | TFLOPs: 30.77 | 7: iteration 48490/ 115203 | consumed samples: 12413440 | consumed tokens: 25422725120 | elapsed time per iteration (s): 0.44 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.334425E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.117 | TFLOPs: 30.75 | 7: iteration 48500/ 115203 | consumed samples: 12416000 | consumed tokens: 25427968000 | elapsed time per iteration (s): 0.43 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.309853E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.939 | TFLOPs: 31.06 | 7: iteration 48510/ 115203 | consumed samples: 12418560 | consumed tokens: 25433210880 | elapsed time per iteration (s): 0.45 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.319485E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.511 | TFLOPs: 29.99 | 7: iteration 48520/ 115203 | consumed samples: 12421120 | consumed tokens: 25438453760 | elapsed time per iteration (s): 0.44 | learning rate: 1.337E-04 | global batch size: 256 | lm loss: 2.290752E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.997 | TFLOPs: 30.85 | 7: iteration 48530/ 115203 | consumed samples: 12423680 | consumed tokens: 25443696640 | elapsed time per iteration (s): 0.44 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.305842E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.147 | TFLOPs: 30.60 | 7: iteration 48540/ 115203 | consumed samples: 12426240 | consumed tokens: 25448939520 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.297296E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.204 | TFLOPs: 31.49 | 7: iteration 48550/ 115203 | consumed samples: 12428800 | consumed tokens: 25454182400 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.325318E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.822 | TFLOPs: 31.05 | 7: iteration 48560/ 115203 | consumed samples: 12431360 | consumed tokens: 25459425280 | elapsed time per iteration (s): 0.43 | learning rate: 1.336E-04 | global batch size: 256 | lm loss: 2.311432E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.380 | TFLOPs: 31.34 | 7: iteration 48570/ 115203 | consumed samples: 12433920 | consumed tokens: 25464668160 | elapsed time per iteration (s): 0.44 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.319592E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.322 | TFLOPs: 30.76 | 7: iteration 48580/ 115203 | consumed samples: 12436480 | consumed tokens: 25469911040 | elapsed time per iteration (s): 0.43 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.285772E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.949 | TFLOPs: 31.01 | 7: iteration 48590/ 115203 | consumed samples: 12439040 | consumed tokens: 25475153920 | elapsed time per iteration (s): 0.43 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.322040E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.462 | TFLOPs: 31.24 | 7: iteration 48600/ 115203 | consumed samples: 12441600 | consumed tokens: 25480396800 | elapsed time per iteration (s): 0.43 | learning rate: 1.335E-04 | global batch size: 256 | lm loss: 2.278585E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.309 | TFLOPs: 31.13 | 7: iteration 48610/ 115203 | consumed samples: 12444160 | consumed tokens: 25485639680 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.294514E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.198 | TFLOPs: 31.33 | 7: iteration 48620/ 115203 | consumed samples: 12446720 | consumed tokens: 25490882560 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.327315E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.399 | TFLOPs: 31.29 | 7: iteration 48630/ 115203 | consumed samples: 12449280 | consumed tokens: 25496125440 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.316278E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.180 | TFLOPs: 31.28 | 7: iteration 48640/ 115203 | consumed samples: 12451840 | consumed tokens: 25501368320 | elapsed time per iteration (s): 0.44 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.302502E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.132 | TFLOPs: 30.28 | 7: iteration 48650/ 115203 | consumed samples: 12454400 | consumed tokens: 25506611200 | elapsed time per iteration (s): 0.43 | learning rate: 1.334E-04 | global batch size: 256 | lm loss: 2.300063E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.353 | TFLOPs: 31.24 | 7: iteration 48660/ 115203 | consumed samples: 12456960 | consumed tokens: 25511854080 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.337360E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.001 | TFLOPs: 31.43 | 7: iteration 48670/ 115203 | consumed samples: 12459520 | consumed tokens: 25517096960 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.314483E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.516 | TFLOPs: 31.30 | 7: iteration 48680/ 115203 | consumed samples: 12462080 | consumed tokens: 25522339840 | elapsed time per iteration (s): 0.44 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.290909E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.745 | TFLOPs: 30.52 | 7: iteration 48690/ 115203 | consumed samples: 12464640 | consumed tokens: 25527582720 | elapsed time per iteration (s): 0.43 | learning rate: 1.333E-04 | global batch size: 256 | lm loss: 2.298730E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.246 | TFLOPs: 31.39 | 7: iteration 48700/ 115203 | consumed samples: 12467200 | consumed tokens: 25532825600 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.299887E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.122 | TFLOPs: 31.23 | 7: iteration 48710/ 115203 | consumed samples: 12469760 | consumed tokens: 25538068480 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.297838E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.800 | TFLOPs: 31.16 | 7: iteration 48720/ 115203 | consumed samples: 12472320 | consumed tokens: 25543311360 | elapsed time per iteration (s): 0.43 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.333290E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.088 | TFLOPs: 31.28 | 7: iteration 48730/ 115203 | consumed samples: 12474880 | consumed tokens: 25548554240 | elapsed time per iteration (s): 0.44 | learning rate: 1.332E-04 | global batch size: 256 | lm loss: 2.323646E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.456 | TFLOPs: 30.25 | 7: iteration 48740/ 115203 | consumed samples: 12477440 | consumed tokens: 25553797120 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.267956E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.363 | TFLOPs: 31.24 | 7: iteration 48750/ 115203 | consumed samples: 12480000 | consumed tokens: 25559040000 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.304500E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.026 | TFLOPs: 30.96 | 7: iteration 48760/ 115203 | consumed samples: 12482560 | consumed tokens: 25564282880 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.300270E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.424 | TFLOPs: 31.40 | 7: iteration 48770/ 115203 | consumed samples: 12485120 | consumed tokens: 25569525760 | elapsed time per iteration (s): 0.43 | learning rate: 1.331E-04 | global batch size: 256 | lm loss: 2.296025E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.167 | TFLOPs: 31.59 | 7: iteration 48780/ 115203 | consumed samples: 12487680 | consumed tokens: 25574768640 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.318825E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.814 | TFLOPs: 31.05 | 7: iteration 48790/ 115203 | consumed samples: 12490240 | consumed tokens: 25580011520 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.304093E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.998 | TFLOPs: 31.53 | 7: iteration 48800/ 115203 | consumed samples: 12492800 | consumed tokens: 25585254400 | elapsed time per iteration (s): 0.43 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.320439E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.082 | TFLOPs: 31.38 | 7: iteration 48810/ 115203 | consumed samples: 12495360 | consumed tokens: 25590497280 | elapsed time per iteration (s): 0.42 | learning rate: 1.330E-04 | global batch size: 256 | lm loss: 2.308180E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.099 | TFLOPs: 31.75 | 7: iteration 48820/ 115203 | consumed samples: 12497920 | consumed tokens: 25595740160 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.313808E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.971 | TFLOPs: 31.01 | 7: iteration 48830/ 115203 | consumed samples: 12500480 | consumed tokens: 25600983040 | elapsed time per iteration (s): 0.42 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.333744E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.075 | TFLOPs: 31.75 | 7: iteration 48840/ 115203 | consumed samples: 12503040 | consumed tokens: 25606225920 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.287535E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.615 | TFLOPs: 31.25 | 7: iteration 48850/ 115203 | consumed samples: 12505600 | consumed tokens: 25611468800 | elapsed time per iteration (s): 0.43 | learning rate: 1.329E-04 | global batch size: 256 | lm loss: 2.304839E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.617 | TFLOPs: 31.41 | 7: iteration 48860/ 115203 | consumed samples: 12508160 | consumed tokens: 25616711680 | elapsed time per iteration (s): 0.44 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.297129E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.754 | TFLOPs: 30.79 | 7: iteration 48870/ 115203 | consumed samples: 12510720 | consumed tokens: 25621954560 | elapsed time per iteration (s): 0.42 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.283560E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.380 | TFLOPs: 31.71 | 7: iteration 48880/ 115203 | consumed samples: 12513280 | consumed tokens: 25627197440 | elapsed time per iteration (s): 0.43 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.301481E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.335 | TFLOPs: 31.34 | 7: iteration 48890/ 115203 | consumed samples: 12515840 | consumed tokens: 25632440320 | elapsed time per iteration (s): 0.44 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.306951E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.548 | TFLOPs: 30.78 | 7: iteration 48900/ 115203 | consumed samples: 12518400 | consumed tokens: 25637683200 | elapsed time per iteration (s): 0.44 | learning rate: 1.328E-04 | global batch size: 256 | lm loss: 2.294220E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.778 | TFLOPs: 30.63 | 7: iteration 48910/ 115203 | consumed samples: 12520960 | consumed tokens: 25642926080 | elapsed time per iteration (s): 0.45 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.321888E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.354 | TFLOPs: 29.82 | 7: iteration 48920/ 115203 | consumed samples: 12523520 | consumed tokens: 25648168960 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.277678E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.850 | TFLOPs: 31.47 | 7: iteration 48930/ 115203 | consumed samples: 12526080 | consumed tokens: 25653411840 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.318273E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.959 | TFLOPs: 31.22 | 7: iteration 48940/ 115203 | consumed samples: 12528640 | consumed tokens: 25658654720 | elapsed time per iteration (s): 0.43 | learning rate: 1.327E-04 | global batch size: 256 | lm loss: 2.281838E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.520 | TFLOPs: 31.40 | 7: iteration 48950/ 115203 | consumed samples: 12531200 | consumed tokens: 25663897600 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.298709E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.933 | TFLOPs: 31.48 | 7: iteration 48960/ 115203 | consumed samples: 12533760 | consumed tokens: 25669140480 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.320016E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.065 | TFLOPs: 31.17 | 7: iteration 48970/ 115203 | consumed samples: 12536320 | consumed tokens: 25674383360 | elapsed time per iteration (s): 0.43 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.287361E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.252 | TFLOPs: 31.60 | 7: iteration 48980/ 115203 | consumed samples: 12538880 | consumed tokens: 25679626240 | elapsed time per iteration (s): 0.44 | learning rate: 1.326E-04 | global batch size: 256 | lm loss: 2.343910E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.004 | TFLOPs: 30.27 | 7: iteration 48990/ 115203 | consumed samples: 12541440 | consumed tokens: 25684869120 | elapsed time per iteration (s): 0.44 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.318246E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.133 | TFLOPs: 30.81 | 7: iteration 49000/ 115203 | consumed samples: 12544000 | consumed tokens: 25690112000 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.290298E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.189 | TFLOPs: 31.49 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 49000 | lm loss value: 2.265970E+00 | lm loss PPL: 9.640471E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 49000 to checkpoints_221m 0: [2022-11-28 18:50:38,610] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step49000 is begin to save! 0: [2022-11-28 18:50:38,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:50:38,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:50:38,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:50:38,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:50:38,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:50:38,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:50:38,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:50:38,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:50:38,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:50:38,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:50:38,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:50:38,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:50:38,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:50:38,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:50:38,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:50:38,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:50:38,879] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:50:38,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:50:38,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:50:38,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:50:38,925] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:50:38,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:50:38,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:50:38,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:50:38,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:50:38,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:50:38,992] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:50:39,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:50:39,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:50:39,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:50:39,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:50:39,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:50:39,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:50:39,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:50:39,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:50:39,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:50:39,111] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:50:39,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:50:39,133] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:50:39,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:50:39,138] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step49000/mp_rank_00_model_states.pt 0: [2022-11-28 18:50:39,138] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:50:39,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:50:39,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:50:39,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,209] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,209] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,210] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,210] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,218] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,218] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 7: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:50:39,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 1: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:50:39,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,224] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,224] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,225] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,225] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 2: [2022-11-28 18:50:39,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:50:39,226] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:50:39,226] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,236] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,236] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,219] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,215] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,215] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,220] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,220] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,221] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,221] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,232] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,223] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,232] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,223] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,232] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,241] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,241] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 6: [2022-11-28 18:50:39,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:50:39,242] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:50:39,242] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:50:39,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,249] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,247] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 5: [2022-11-28 18:50:39,247] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 4: [2022-11-28 18:50:39,250] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:50:39,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,248] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 3: [2022-11-28 18:50:39,248] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: [2022-11-28 18:50:39,275] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step49000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:50:39,275] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step49000 is ready now! 0: successfully saved checkpoint at iteration 49000 to checkpoints_221m 7: time (ms) | save-checkpoint: 707.08 7: iteration 49010/ 115203 | consumed samples: 12546560 | consumed tokens: 25695354880 | elapsed time per iteration (s): 0.52 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.298933E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.258 | TFLOPs: 25.67 | 7: iteration 49020/ 115203 | consumed samples: 12549120 | consumed tokens: 25700597760 | elapsed time per iteration (s): 0.43 | learning rate: 1.325E-04 | global batch size: 256 | lm loss: 2.307461E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.453 | TFLOPs: 31.29 | 7: iteration 49030/ 115203 | consumed samples: 12551680 | consumed tokens: 25705840640 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.321621E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.790 | TFLOPs: 31.21 | 7: iteration 49040/ 115203 | consumed samples: 12554240 | consumed tokens: 25711083520 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.278279E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.746 | TFLOPs: 31.10 | 7: iteration 49050/ 115203 | consumed samples: 12556800 | consumed tokens: 25716326400 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.317612E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.627 | TFLOPs: 30.99 | 7: iteration 49060/ 115203 | consumed samples: 12559360 | consumed tokens: 25721569280 | elapsed time per iteration (s): 0.43 | learning rate: 1.324E-04 | global batch size: 256 | lm loss: 2.309369E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.687 | TFLOPs: 31.31 | 7: iteration 49070/ 115203 | consumed samples: 12561920 | consumed tokens: 25726812160 | elapsed time per iteration (s): 0.60 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.271340E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 429.757 | TFLOPs: 22.55 | 7: iteration 49080/ 115203 | consumed samples: 12564480 | consumed tokens: 25732055040 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.317222E+00 | grad norm: 0.189 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.922 | TFLOPs: 31.32 | 7: iteration 49090/ 115203 | consumed samples: 12567040 | consumed tokens: 25737297920 | elapsed time per iteration (s): 0.44 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.340315E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.799 | TFLOPs: 30.79 | 7: iteration 49100/ 115203 | consumed samples: 12569600 | consumed tokens: 25742540800 | elapsed time per iteration (s): 0.43 | learning rate: 1.323E-04 | global batch size: 256 | lm loss: 2.311907E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.408 | TFLOPs: 31.45 | 7: iteration 49110/ 115203 | consumed samples: 12572160 | consumed tokens: 25747783680 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.279143E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.273 | TFLOPs: 31.13 | 7: iteration 49120/ 115203 | consumed samples: 12574720 | consumed tokens: 25753026560 | elapsed time per iteration (s): 0.44 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.296032E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.218 | TFLOPs: 30.34 | 7: iteration 49130/ 115203 | consumed samples: 12577280 | consumed tokens: 25758269440 | elapsed time per iteration (s): 0.46 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.322654E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.189 | TFLOPs: 29.50 | 7: iteration 49140/ 115203 | consumed samples: 12579840 | consumed tokens: 25763512320 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.278228E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.465 | TFLOPs: 31.03 | 7: iteration 49150/ 115203 | consumed samples: 12582400 | consumed tokens: 25768755200 | elapsed time per iteration (s): 0.43 | learning rate: 1.322E-04 | global batch size: 256 | lm loss: 2.313228E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.446 | TFLOPs: 31.35 | 7: iteration 49160/ 115203 | consumed samples: 12584960 | consumed tokens: 25773998080 | elapsed time per iteration (s): 0.43 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.299957E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.292 | TFLOPs: 31.18 | 7: iteration 49170/ 115203 | consumed samples: 12587520 | consumed tokens: 25779240960 | elapsed time per iteration (s): 0.43 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.293678E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.459 | TFLOPs: 31.35 | 7: iteration 49180/ 115203 | consumed samples: 12590080 | consumed tokens: 25784483840 | elapsed time per iteration (s): 0.44 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.322725E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.540 | TFLOPs: 30.83 | 7: iteration 49190/ 115203 | consumed samples: 12592640 | consumed tokens: 25789726720 | elapsed time per iteration (s): 0.43 | learning rate: 1.321E-04 | global batch size: 256 | lm loss: 2.295048E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.818 | TFLOPs: 31.21 | 7: iteration 49200/ 115203 | consumed samples: 12595200 | consumed tokens: 25794969600 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.302915E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.114 | TFLOPs: 30.96 | 7: iteration 49210/ 115203 | consumed samples: 12597760 | consumed tokens: 25800212480 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.329164E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.024 | TFLOPs: 30.91 | 7: iteration 49220/ 115203 | consumed samples: 12600320 | consumed tokens: 25805455360 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.272352E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.749 | TFLOPs: 31.26 | 7: iteration 49230/ 115203 | consumed samples: 12602880 | consumed tokens: 25810698240 | elapsed time per iteration (s): 0.43 | learning rate: 1.320E-04 | global batch size: 256 | lm loss: 2.323042E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.542 | TFLOPs: 30.88 | 7: iteration 49240/ 115203 | consumed samples: 12605440 | consumed tokens: 25815941120 | elapsed time per iteration (s): 0.42 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.301290E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.702 | TFLOPs: 31.68 | 7: iteration 49250/ 115203 | consumed samples: 12608000 | consumed tokens: 25821184000 | elapsed time per iteration (s): 0.44 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.284616E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.400 | TFLOPs: 30.71 | 7: iteration 49260/ 115203 | consumed samples: 12610560 | consumed tokens: 25826426880 | elapsed time per iteration (s): 0.44 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.296236E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.882 | TFLOPs: 30.69 | 7: iteration 49270/ 115203 | consumed samples: 12613120 | consumed tokens: 25831669760 | elapsed time per iteration (s): 0.43 | learning rate: 1.319E-04 | global batch size: 256 | lm loss: 2.256665E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.076 | TFLOPs: 30.91 | 7: iteration 49280/ 115203 | consumed samples: 12615680 | consumed tokens: 25836912640 | elapsed time per iteration (s): 0.43 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.330648E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.012 | TFLOPs: 30.96 | 7: iteration 49290/ 115203 | consumed samples: 12618240 | consumed tokens: 25842155520 | elapsed time per iteration (s): 0.43 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.303020E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.474 | TFLOPs: 31.14 | 7: iteration 49300/ 115203 | consumed samples: 12620800 | consumed tokens: 25847398400 | elapsed time per iteration (s): 0.43 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.325378E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.407 | TFLOPs: 31.03 | 7: iteration 49310/ 115203 | consumed samples: 12623360 | consumed tokens: 25852641280 | elapsed time per iteration (s): 0.43 | learning rate: 1.318E-04 | global batch size: 256 | lm loss: 2.300419E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.334 | TFLOPs: 31.55 | 7: iteration 49320/ 115203 | consumed samples: 12625920 | consumed tokens: 25857884160 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.326668E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.507 | TFLOPs: 30.98 | 7: iteration 49330/ 115203 | consumed samples: 12628480 | consumed tokens: 25863127040 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.311910E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.087 | TFLOPs: 31.17 | 7: iteration 49340/ 115203 | consumed samples: 12631040 | consumed tokens: 25868369920 | elapsed time per iteration (s): 0.46 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.297451E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.964 | TFLOPs: 29.38 | 7: iteration 49350/ 115203 | consumed samples: 12633600 | consumed tokens: 25873612800 | elapsed time per iteration (s): 0.43 | learning rate: 1.317E-04 | global batch size: 256 | lm loss: 2.302803E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.988 | TFLOPs: 31.17 | 7: iteration 49360/ 115203 | consumed samples: 12636160 | consumed tokens: 25878855680 | elapsed time per iteration (s): 0.44 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.324561E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.354 | TFLOPs: 30.56 | 7: iteration 49370/ 115203 | consumed samples: 12638720 | consumed tokens: 25884098560 | elapsed time per iteration (s): 0.44 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.311153E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.500 | TFLOPs: 30.46 | 7: iteration 49380/ 115203 | consumed samples: 12641280 | consumed tokens: 25889341440 | elapsed time per iteration (s): 0.43 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.324634E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.357 | TFLOPs: 30.92 | 7: iteration 49390/ 115203 | consumed samples: 12643840 | consumed tokens: 25894584320 | elapsed time per iteration (s): 0.44 | learning rate: 1.316E-04 | global batch size: 256 | lm loss: 2.319097E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.317 | TFLOPs: 30.71 | 7: iteration 49400/ 115203 | consumed samples: 12646400 | consumed tokens: 25899827200 | elapsed time per iteration (s): 0.43 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.303934E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.367 | TFLOPs: 31.03 | 7: iteration 49410/ 115203 | consumed samples: 12648960 | consumed tokens: 25905070080 | elapsed time per iteration (s): 0.44 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.304646E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.953 | TFLOPs: 30.69 | 7: iteration 49420/ 115203 | consumed samples: 12651520 | consumed tokens: 25910312960 | elapsed time per iteration (s): 0.44 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.270557E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.728 | TFLOPs: 30.84 | 7: iteration 49430/ 115203 | consumed samples: 12654080 | consumed tokens: 25915555840 | elapsed time per iteration (s): 0.43 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.334212E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.355 | TFLOPs: 31.03 | 7: iteration 49440/ 115203 | consumed samples: 12656640 | consumed tokens: 25920798720 | elapsed time per iteration (s): 0.43 | learning rate: 1.315E-04 | global batch size: 256 | lm loss: 2.330144E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.693 | TFLOPs: 31.20 | 7: iteration 49450/ 115203 | consumed samples: 12659200 | consumed tokens: 25926041600 | elapsed time per iteration (s): 0.43 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.296040E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.487 | TFLOPs: 30.93 | 7: iteration 49460/ 115203 | consumed samples: 12661760 | consumed tokens: 25931284480 | elapsed time per iteration (s): 0.42 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.300667E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.328 | TFLOPs: 32.13 | 7: iteration 49470/ 115203 | consumed samples: 12664320 | consumed tokens: 25936527360 | elapsed time per iteration (s): 0.44 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.312470E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.806 | TFLOPs: 30.79 | 7: iteration 49480/ 115203 | consumed samples: 12666880 | consumed tokens: 25941770240 | elapsed time per iteration (s): 0.45 | learning rate: 1.314E-04 | global batch size: 256 | lm loss: 2.296381E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.767 | TFLOPs: 30.05 | 7: iteration 49490/ 115203 | consumed samples: 12669440 | consumed tokens: 25947013120 | elapsed time per iteration (s): 0.42 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.305431E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.215 | TFLOPs: 31.86 | 7: iteration 49500/ 115203 | consumed samples: 12672000 | consumed tokens: 25952256000 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.326676E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.909 | TFLOPs: 31.16 | 7: iteration 49510/ 115203 | consumed samples: 12674560 | consumed tokens: 25957498880 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.331449E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.509 | TFLOPs: 31.09 | 7: iteration 49520/ 115203 | consumed samples: 12677120 | consumed tokens: 25962741760 | elapsed time per iteration (s): 0.43 | learning rate: 1.313E-04 | global batch size: 256 | lm loss: 2.304872E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.689 | TFLOPs: 31.15 | 7: iteration 49530/ 115203 | consumed samples: 12679680 | consumed tokens: 25967984640 | elapsed time per iteration (s): 0.43 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.311176E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.645 | TFLOPs: 31.41 | 7: iteration 49540/ 115203 | consumed samples: 12682240 | consumed tokens: 25973227520 | elapsed time per iteration (s): 0.43 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.306276E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.940 | TFLOPs: 31.58 | 7: iteration 49550/ 115203 | consumed samples: 12684800 | consumed tokens: 25978470400 | elapsed time per iteration (s): 0.43 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.316852E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.790 | TFLOPs: 31.21 | 7: iteration 49560/ 115203 | consumed samples: 12687360 | consumed tokens: 25983713280 | elapsed time per iteration (s): 0.43 | learning rate: 1.312E-04 | global batch size: 256 | lm loss: 2.315673E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.868 | TFLOPs: 31.16 | 7: iteration 49570/ 115203 | consumed samples: 12689920 | consumed tokens: 25988956160 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.285731E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.288 | TFLOPs: 30.92 | 7: iteration 49580/ 115203 | consumed samples: 12692480 | consumed tokens: 25994199040 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.318797E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.403 | TFLOPs: 31.13 | 7: iteration 49590/ 115203 | consumed samples: 12695040 | consumed tokens: 25999441920 | elapsed time per iteration (s): 0.43 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.295596E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.487 | TFLOPs: 31.09 | 7: iteration 49600/ 115203 | consumed samples: 12697600 | consumed tokens: 26004684800 | elapsed time per iteration (s): 0.42 | learning rate: 1.311E-04 | global batch size: 256 | lm loss: 2.315674E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.282 | TFLOPs: 31.97 | 7: iteration 49610/ 115203 | consumed samples: 12700160 | consumed tokens: 26009927680 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.303275E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.634 | TFLOPs: 30.88 | 7: iteration 49620/ 115203 | consumed samples: 12702720 | consumed tokens: 26015170560 | elapsed time per iteration (s): 0.42 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.283788E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.463 | TFLOPs: 31.77 | 7: iteration 49630/ 115203 | consumed samples: 12705280 | consumed tokens: 26020413440 | elapsed time per iteration (s): 0.43 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.334262E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.576 | TFLOPs: 31.20 | 7: iteration 49640/ 115203 | consumed samples: 12707840 | consumed tokens: 26025656320 | elapsed time per iteration (s): 0.42 | learning rate: 1.310E-04 | global batch size: 256 | lm loss: 2.291533E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.966 | TFLOPs: 31.69 | 7: iteration 49650/ 115203 | consumed samples: 12710400 | consumed tokens: 26030899200 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.342069E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.326 | TFLOPs: 31.13 | 7: iteration 49660/ 115203 | consumed samples: 12712960 | consumed tokens: 26036142080 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.309910E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.530 | TFLOPs: 30.88 | 7: iteration 49670/ 115203 | consumed samples: 12715520 | consumed tokens: 26041384960 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.306100E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.841 | TFLOPs: 30.90 | 7: iteration 49680/ 115203 | consumed samples: 12718080 | consumed tokens: 26046627840 | elapsed time per iteration (s): 0.42 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.312766E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.747 | TFLOPs: 31.78 | 7: iteration 49690/ 115203 | consumed samples: 12720640 | consumed tokens: 26051870720 | elapsed time per iteration (s): 0.43 | learning rate: 1.309E-04 | global batch size: 256 | lm loss: 2.306430E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.109 | TFLOPs: 30.96 | 7: iteration 49700/ 115203 | consumed samples: 12723200 | consumed tokens: 26057113600 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.290257E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.707 | TFLOPs: 31.36 | 7: iteration 49710/ 115203 | consumed samples: 12725760 | consumed tokens: 26062356480 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.295524E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.241 | TFLOPs: 31.49 | 7: iteration 49720/ 115203 | consumed samples: 12728320 | consumed tokens: 26067599360 | elapsed time per iteration (s): 0.42 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.328482E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.681 | TFLOPs: 31.78 | 7: iteration 49730/ 115203 | consumed samples: 12730880 | consumed tokens: 26072842240 | elapsed time per iteration (s): 0.43 | learning rate: 1.308E-04 | global batch size: 256 | lm loss: 2.271067E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.870 | TFLOPs: 31.21 | 7: iteration 49740/ 115203 | consumed samples: 12733440 | consumed tokens: 26078085120 | elapsed time per iteration (s): 0.43 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.269898E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.164 | TFLOPs: 31.44 | 7: iteration 49750/ 115203 | consumed samples: 12736000 | consumed tokens: 26083328000 | elapsed time per iteration (s): 0.42 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.313450E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.788 | TFLOPs: 31.78 | 7: iteration 49760/ 115203 | consumed samples: 12738560 | consumed tokens: 26088570880 | elapsed time per iteration (s): 0.44 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.303578E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.499 | TFLOPs: 30.72 | 7: iteration 49770/ 115203 | consumed samples: 12741120 | consumed tokens: 26093813760 | elapsed time per iteration (s): 0.44 | learning rate: 1.307E-04 | global batch size: 256 | lm loss: 2.296029E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.714 | TFLOPs: 30.73 | 7: iteration 49780/ 115203 | consumed samples: 12743680 | consumed tokens: 26099056640 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.309642E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.945 | TFLOPs: 31.27 | 7: iteration 49790/ 115203 | consumed samples: 12746240 | consumed tokens: 26104299520 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.352488E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.686 | TFLOPs: 31.15 | 7: iteration 49800/ 115203 | consumed samples: 12748800 | consumed tokens: 26109542400 | elapsed time per iteration (s): 0.43 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.272293E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.457 | TFLOPs: 31.35 | 7: iteration 49810/ 115203 | consumed samples: 12751360 | consumed tokens: 26114785280 | elapsed time per iteration (s): 0.44 | learning rate: 1.306E-04 | global batch size: 256 | lm loss: 2.311657E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.089 | TFLOPs: 30.65 | 7: iteration 49820/ 115203 | consumed samples: 12753920 | consumed tokens: 26120028160 | elapsed time per iteration (s): 0.42 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.313139E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.967 | TFLOPs: 31.79 | 7: iteration 49830/ 115203 | consumed samples: 12756480 | consumed tokens: 26125271040 | elapsed time per iteration (s): 0.45 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.302294E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.517 | TFLOPs: 29.67 | 7: iteration 49840/ 115203 | consumed samples: 12759040 | consumed tokens: 26130513920 | elapsed time per iteration (s): 0.43 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.319651E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.987 | TFLOPs: 31.11 | 7: iteration 49850/ 115203 | consumed samples: 12761600 | consumed tokens: 26135756800 | elapsed time per iteration (s): 0.43 | learning rate: 1.305E-04 | global batch size: 256 | lm loss: 2.296202E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.479 | TFLOPs: 31.30 | 7: iteration 49860/ 115203 | consumed samples: 12764160 | consumed tokens: 26140999680 | elapsed time per iteration (s): 0.43 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.297156E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.775 | TFLOPs: 30.94 | 7: iteration 49870/ 115203 | consumed samples: 12766720 | consumed tokens: 26146242560 | elapsed time per iteration (s): 0.44 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.328227E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.473 | TFLOPs: 30.67 | 7: iteration 49880/ 115203 | consumed samples: 12769280 | consumed tokens: 26151485440 | elapsed time per iteration (s): 0.44 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.294141E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.620 | TFLOPs: 30.41 | 7: iteration 49890/ 115203 | consumed samples: 12771840 | consumed tokens: 26156728320 | elapsed time per iteration (s): 0.44 | learning rate: 1.304E-04 | global batch size: 256 | lm loss: 2.281314E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.859 | TFLOPs: 30.58 | 7: iteration 49900/ 115203 | consumed samples: 12774400 | consumed tokens: 26161971200 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.306278E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.441 | TFLOPs: 31.19 | 7: iteration 49910/ 115203 | consumed samples: 12776960 | consumed tokens: 26167214080 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.327616E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.114 | TFLOPs: 31.07 | 7: iteration 49920/ 115203 | consumed samples: 12779520 | consumed tokens: 26172456960 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.293497E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.145 | TFLOPs: 31.33 | 7: iteration 49930/ 115203 | consumed samples: 12782080 | consumed tokens: 26177699840 | elapsed time per iteration (s): 0.43 | learning rate: 1.303E-04 | global batch size: 256 | lm loss: 2.272080E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.554 | TFLOPs: 31.20 | 7: iteration 49940/ 115203 | consumed samples: 12784640 | consumed tokens: 26182942720 | elapsed time per iteration (s): 0.45 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.302531E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.037 | TFLOPs: 30.07 | 7: iteration 49950/ 115203 | consumed samples: 12787200 | consumed tokens: 26188185600 | elapsed time per iteration (s): 0.44 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.316874E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.048 | TFLOPs: 30.38 | 7: iteration 49960/ 115203 | consumed samples: 12789760 | consumed tokens: 26193428480 | elapsed time per iteration (s): 0.42 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.339974E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.737 | TFLOPs: 31.62 | 7: iteration 49970/ 115203 | consumed samples: 12792320 | consumed tokens: 26198671360 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.318864E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.724 | TFLOPs: 31.41 | 7: iteration 49980/ 115203 | consumed samples: 12794880 | consumed tokens: 26203914240 | elapsed time per iteration (s): 0.43 | learning rate: 1.302E-04 | global batch size: 256 | lm loss: 2.297004E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.316 | TFLOPs: 31.55 | 7: iteration 49990/ 115203 | consumed samples: 12797440 | consumed tokens: 26209157120 | elapsed time per iteration (s): 0.44 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.312353E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.349 | TFLOPs: 30.82 | 0: [2022-11-28 18:57:53,593] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=0, lr=[0.00013010274525760026, 0.00013010274525760026, 0.00013010274525760026], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 50000/ 115203 | consumed samples: 12800000 | consumed tokens: 26214400000 | elapsed time per iteration (s): 0.44 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.295712E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.309 | TFLOPs: 30.29 | 0: steps: 50000 loss: 2.3177 iter time (s): 0.432 samples/sec: 592.658 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 50000 | lm loss value: 2.231603E+00 | lm loss PPL: 9.314783E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 50000 to checkpoints_221m 0: [2022-11-28 18:57:53,770] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step50000 is begin to save! 0: [2022-11-28 18:57:53,786] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_01-model_00-model_states.pt... 0: [2022-11-28 18:57:53,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_01-model_00-model_states.pt. 0: [2022-11-28 18:57:53,893] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_03-model_00-model_states.pt... 0: [2022-11-28 18:57:53,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_03-model_00-model_states.pt. 0: [2022-11-28 18:57:53,914] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_04-model_00-model_states.pt... 0: [2022-11-28 18:57:53,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_04-model_00-model_states.pt. 0: [2022-11-28 18:57:53,938] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_05-model_00-model_states.pt... 0: [2022-11-28 18:57:53,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_05-model_00-model_states.pt. 0: [2022-11-28 18:57:53,961] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_06-model_00-model_states.pt... 0: [2022-11-28 18:57:53,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_06-model_00-model_states.pt. 0: [2022-11-28 18:57:53,984] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_07-model_00-model_states.pt... 0: [2022-11-28 18:57:54,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_07-model_00-model_states.pt. 0: [2022-11-28 18:57:54,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_08-model_00-model_states.pt... 0: [2022-11-28 18:57:54,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_08-model_00-model_states.pt. 0: [2022-11-28 18:57:54,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_09-model_00-model_states.pt... 0: [2022-11-28 18:57:54,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_09-model_00-model_states.pt. 0: [2022-11-28 18:57:54,054] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_10-model_00-model_states.pt... 0: [2022-11-28 18:57:54,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_10-model_00-model_states.pt. 0: [2022-11-28 18:57:54,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_11-model_00-model_states.pt... 0: [2022-11-28 18:57:54,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_11-model_00-model_states.pt. 0: [2022-11-28 18:57:54,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_12-model_00-model_states.pt... 0: [2022-11-28 18:57:54,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_12-model_00-model_states.pt. 0: [2022-11-28 18:57:54,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_13-model_00-model_states.pt... 0: [2022-11-28 18:57:54,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_13-model_00-model_states.pt. 0: [2022-11-28 18:57:54,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_14-model_00-model_states.pt... 0: [2022-11-28 18:57:54,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_14-model_00-model_states.pt. 0: [2022-11-28 18:57:54,170] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_15-model_00-model_states.pt... 0: [2022-11-28 18:57:54,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_15-model_00-model_states.pt. 0: [2022-11-28 18:57:54,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_16-model_00-model_states.pt... 0: [2022-11-28 18:57:54,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_16-model_00-model_states.pt. 0: [2022-11-28 18:57:54,217] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_17-model_00-model_states.pt... 0: [2022-11-28 18:57:54,239] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_17-model_00-model_states.pt. 0: [2022-11-28 18:57:54,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_18-model_00-model_states.pt... 0: [2022-11-28 18:57:54,262] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_18-model_00-model_states.pt. 0: [2022-11-28 18:57:54,263] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_19-model_00-model_states.pt... 0: [2022-11-28 18:57:54,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_19-model_00-model_states.pt. 0: [2022-11-28 18:57:54,286] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_20-model_00-model_states.pt... 0: [2022-11-28 18:57:54,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_20-model_00-model_states.pt. 0: [2022-11-28 18:57:54,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/layer_22-model_00-model_states.pt... 0: [2022-11-28 18:57:54,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/layer_22-model_00-model_states.pt. 0: [2022-11-28 18:57:54,315] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step50000/mp_rank_00_model_states.pt 0: [2022-11-28 18:57:54,315] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/mp_rank_00_model_states.pt... 0: [2022-11-28 18:57:54,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/mp_rank_00_model_states.pt. 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 6: [2022-11-28 18:57:54,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 2: [2022-11-28 18:57:54,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 7: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 2: [2022-11-28 18:57:54,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 18:57:54,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 18:57:54,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 18:57:54,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 6: [2022-11-28 18:57:54,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 1: [2022-11-28 18:57:54,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 18:57:54,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 18:57:54,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,410] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,410] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 4: [2022-11-28 18:57:54,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 18:57:54,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 18:57:54,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 18:57:54,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 3: [2022-11-28 18:57:54,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 18:57:54,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: [2022-11-28 18:57:54,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 18:57:54,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 18:57:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step50000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 5: [2022-11-28 18:57:54,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step50000 is ready now! 0: successfully saved checkpoint at iteration 50000 to checkpoints_221m 7: time (ms) | save-checkpoint: 766.98 7: iteration 50010/ 115203 | consumed samples: 12802560 | consumed tokens: 26219642880 | elapsed time per iteration (s): 0.53 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.308498E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 485.523 | TFLOPs: 25.47 | 7: iteration 50020/ 115203 | consumed samples: 12805120 | consumed tokens: 26224885760 | elapsed time per iteration (s): 0.43 | learning rate: 1.301E-04 | global batch size: 256 | lm loss: 2.316868E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.580 | TFLOPs: 31.51 | 7: iteration 50030/ 115203 | consumed samples: 12807680 | consumed tokens: 26230128640 | elapsed time per iteration (s): 0.43 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.287089E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.852 | TFLOPs: 31.42 | 7: iteration 50040/ 115203 | consumed samples: 12810240 | consumed tokens: 26235371520 | elapsed time per iteration (s): 0.43 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.293211E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.972 | TFLOPs: 31.48 | 7: iteration 50050/ 115203 | consumed samples: 12812800 | consumed tokens: 26240614400 | elapsed time per iteration (s): 0.43 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.284753E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.685 | TFLOPs: 31.46 | 7: iteration 50060/ 115203 | consumed samples: 12815360 | consumed tokens: 26245857280 | elapsed time per iteration (s): 0.42 | learning rate: 1.300E-04 | global batch size: 256 | lm loss: 2.320065E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.363 | TFLOPs: 31.87 | 7: iteration 50070/ 115203 | consumed samples: 12817920 | consumed tokens: 26251100160 | elapsed time per iteration (s): 0.63 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.286672E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 407.261 | TFLOPs: 21.37 | 7: iteration 50080/ 115203 | consumed samples: 12820480 | consumed tokens: 26256343040 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.286696E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.604 | TFLOPs: 31.30 | 7: iteration 50090/ 115203 | consumed samples: 12823040 | consumed tokens: 26261585920 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.355160E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.495 | TFLOPs: 31.30 | 7: iteration 50100/ 115203 | consumed samples: 12825600 | consumed tokens: 26266828800 | elapsed time per iteration (s): 0.43 | learning rate: 1.299E-04 | global batch size: 256 | lm loss: 2.305118E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.917 | TFLOPs: 31.58 | 7: iteration 50110/ 115203 | consumed samples: 12828160 | consumed tokens: 26272071680 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.272335E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.312 | TFLOPs: 31.03 | 7: iteration 50120/ 115203 | consumed samples: 12830720 | consumed tokens: 26277314560 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.285744E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.824 | TFLOPs: 30.89 | 7: iteration 50130/ 115203 | consumed samples: 12833280 | consumed tokens: 26282557440 | elapsed time per iteration (s): 0.43 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.327192E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.273 | TFLOPs: 31.60 | 7: iteration 50140/ 115203 | consumed samples: 12835840 | consumed tokens: 26287800320 | elapsed time per iteration (s): 0.42 | learning rate: 1.298E-04 | global batch size: 256 | lm loss: 2.298103E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.207 | TFLOPs: 31.70 | 7: iteration 50150/ 115203 | consumed samples: 12838400 | consumed tokens: 26293043200 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.312156E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.325 | TFLOPs: 31.71 | 7: iteration 50160/ 115203 | consumed samples: 12840960 | consumed tokens: 26298286080 | elapsed time per iteration (s): 0.43 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.295763E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.851 | TFLOPs: 31.11 | 7: iteration 50170/ 115203 | consumed samples: 12843520 | consumed tokens: 26303528960 | elapsed time per iteration (s): 0.44 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.274837E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.027 | TFLOPs: 30.28 | 7: iteration 50180/ 115203 | consumed samples: 12846080 | consumed tokens: 26308771840 | elapsed time per iteration (s): 0.42 | learning rate: 1.297E-04 | global batch size: 256 | lm loss: 2.294348E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.139 | TFLOPs: 31.65 | 7: iteration 50190/ 115203 | consumed samples: 12848640 | consumed tokens: 26314014720 | elapsed time per iteration (s): 0.43 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.279013E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.160 | TFLOPs: 31.38 | 7: iteration 50200/ 115203 | consumed samples: 12851200 | consumed tokens: 26319257600 | elapsed time per iteration (s): 0.42 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.305078E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.835 | TFLOPs: 31.68 | 7: iteration 50210/ 115203 | consumed samples: 12853760 | consumed tokens: 26324500480 | elapsed time per iteration (s): 0.43 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.298482E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.705 | TFLOPs: 31.20 | 7: iteration 50220/ 115203 | consumed samples: 12856320 | consumed tokens: 26329743360 | elapsed time per iteration (s): 0.42 | learning rate: 1.296E-04 | global batch size: 256 | lm loss: 2.289584E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.880 | TFLOPs: 31.63 | 7: iteration 50230/ 115203 | consumed samples: 12858880 | consumed tokens: 26334986240 | elapsed time per iteration (s): 0.46 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.316490E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.125 | TFLOPs: 29.13 | 7: iteration 50240/ 115203 | consumed samples: 12861440 | consumed tokens: 26340229120 | elapsed time per iteration (s): 0.43 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.301661E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.900 | TFLOPs: 31.42 | 7: iteration 50250/ 115203 | consumed samples: 12864000 | consumed tokens: 26345472000 | elapsed time per iteration (s): 0.45 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.311626E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.176 | TFLOPs: 30.13 | 7: iteration 50260/ 115203 | consumed samples: 12866560 | consumed tokens: 26350714880 | elapsed time per iteration (s): 0.43 | learning rate: 1.295E-04 | global batch size: 256 | lm loss: 2.282137E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.789 | TFLOPs: 31.52 | 7: iteration 50270/ 115203 | consumed samples: 12869120 | consumed tokens: 26355957760 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.307793E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.246 | TFLOPs: 30.92 | 7: iteration 50280/ 115203 | consumed samples: 12871680 | consumed tokens: 26361200640 | elapsed time per iteration (s): 0.42 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.319576E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.352 | TFLOPs: 31.87 | 7: iteration 50290/ 115203 | consumed samples: 12874240 | consumed tokens: 26366443520 | elapsed time per iteration (s): 0.43 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.299878E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.139 | TFLOPs: 31.54 | 7: iteration 50300/ 115203 | consumed samples: 12876800 | consumed tokens: 26371686400 | elapsed time per iteration (s): 0.44 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.277017E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.620 | TFLOPs: 30.46 | 7: iteration 50310/ 115203 | consumed samples: 12879360 | consumed tokens: 26376929280 | elapsed time per iteration (s): 0.42 | learning rate: 1.294E-04 | global batch size: 256 | lm loss: 2.306842E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.326 | TFLOPs: 31.76 | 7: iteration 50320/ 115203 | consumed samples: 12881920 | consumed tokens: 26382172160 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.316910E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.705 | TFLOPs: 31.52 | 7: iteration 50330/ 115203 | consumed samples: 12884480 | consumed tokens: 26387415040 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.277554E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.512 | TFLOPs: 30.93 | 7: iteration 50340/ 115203 | consumed samples: 12887040 | consumed tokens: 26392657920 | elapsed time per iteration (s): 0.45 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.287168E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.585 | TFLOPs: 29.94 | 7: iteration 50350/ 115203 | consumed samples: 12889600 | consumed tokens: 26397900800 | elapsed time per iteration (s): 0.43 | learning rate: 1.293E-04 | global batch size: 256 | lm loss: 2.282928E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.003 | TFLOPs: 30.90 | 7: iteration 50360/ 115203 | consumed samples: 12892160 | consumed tokens: 26403143680 | elapsed time per iteration (s): 0.44 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.337055E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.197 | TFLOPs: 30.81 | 7: iteration 50370/ 115203 | consumed samples: 12894720 | consumed tokens: 26408386560 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.305693E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.385 | TFLOPs: 30.98 | 7: iteration 50380/ 115203 | consumed samples: 12897280 | consumed tokens: 26413629440 | elapsed time per iteration (s): 0.44 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.341365E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.789 | TFLOPs: 30.68 | 7: iteration 50390/ 115203 | consumed samples: 12899840 | consumed tokens: 26418872320 | elapsed time per iteration (s): 0.43 | learning rate: 1.292E-04 | global batch size: 256 | lm loss: 2.288378E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.213 | TFLOPs: 31.28 | 7: iteration 50400/ 115203 | consumed samples: 12902400 | consumed tokens: 26424115200 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.318155E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.472 | TFLOPs: 31.35 | 7: iteration 50410/ 115203 | consumed samples: 12904960 | consumed tokens: 26429358080 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.309974E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.811 | TFLOPs: 31.31 | 7: iteration 50420/ 115203 | consumed samples: 12907520 | consumed tokens: 26434600960 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.314743E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.191 | TFLOPs: 31.07 | 7: iteration 50430/ 115203 | consumed samples: 12910080 | consumed tokens: 26439843840 | elapsed time per iteration (s): 0.43 | learning rate: 1.291E-04 | global batch size: 256 | lm loss: 2.319259E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.776 | TFLOPs: 30.94 | 7: iteration 50440/ 115203 | consumed samples: 12912640 | consumed tokens: 26445086720 | elapsed time per iteration (s): 0.42 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.301580E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.364 | TFLOPs: 31.71 | 7: iteration 50450/ 115203 | consumed samples: 12915200 | consumed tokens: 26450329600 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.294803E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.938 | TFLOPs: 31.32 | 7: iteration 50460/ 115203 | consumed samples: 12917760 | consumed tokens: 26455572480 | elapsed time per iteration (s): 0.43 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.322544E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.568 | TFLOPs: 30.93 | 7: iteration 50470/ 115203 | consumed samples: 12920320 | consumed tokens: 26460815360 | elapsed time per iteration (s): 0.44 | learning rate: 1.290E-04 | global batch size: 256 | lm loss: 2.328123E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.487 | TFLOPs: 30.82 | 7: iteration 50480/ 115203 | consumed samples: 12922880 | consumed tokens: 26466058240 | elapsed time per iteration (s): 0.42 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.325486E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.758 | TFLOPs: 31.68 | 7: iteration 50490/ 115203 | consumed samples: 12925440 | consumed tokens: 26471301120 | elapsed time per iteration (s): 0.43 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.290014E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.109 | TFLOPs: 31.38 | 7: iteration 50500/ 115203 | consumed samples: 12928000 | consumed tokens: 26476544000 | elapsed time per iteration (s): 0.46 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.286926E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.633 | TFLOPs: 29.26 | 7: iteration 50510/ 115203 | consumed samples: 12930560 | consumed tokens: 26481786880 | elapsed time per iteration (s): 0.42 | learning rate: 1.289E-04 | global batch size: 256 | lm loss: 2.305928E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.459 | TFLOPs: 31.82 | 7: iteration 50520/ 115203 | consumed samples: 12933120 | consumed tokens: 26487029760 | elapsed time per iteration (s): 0.44 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.311869E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.594 | TFLOPs: 30.31 | 7: iteration 50530/ 115203 | consumed samples: 12935680 | consumed tokens: 26492272640 | elapsed time per iteration (s): 0.43 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.332788E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.054 | TFLOPs: 31.48 | 7: iteration 50540/ 115203 | consumed samples: 12938240 | consumed tokens: 26497515520 | elapsed time per iteration (s): 0.43 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.289710E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.282 | TFLOPs: 31.18 | 7: iteration 50550/ 115203 | consumed samples: 12940800 | consumed tokens: 26502758400 | elapsed time per iteration (s): 0.43 | learning rate: 1.288E-04 | global batch size: 256 | lm loss: 2.279495E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.862 | TFLOPs: 31.11 | 7: iteration 50560/ 115203 | consumed samples: 12943360 | consumed tokens: 26508001280 | elapsed time per iteration (s): 0.42 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.311047E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.606 | TFLOPs: 31.62 | 7: iteration 50570/ 115203 | consumed samples: 12945920 | consumed tokens: 26513244160 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.308788E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.453 | TFLOPs: 31.35 | 7: iteration 50580/ 115203 | consumed samples: 12948480 | consumed tokens: 26518487040 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.296245E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.304 | TFLOPs: 31.34 | 7: iteration 50590/ 115203 | consumed samples: 12951040 | consumed tokens: 26523729920 | elapsed time per iteration (s): 0.44 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.316831E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.410 | TFLOPs: 30.82 | 7: iteration 50600/ 115203 | consumed samples: 12953600 | consumed tokens: 26528972800 | elapsed time per iteration (s): 0.43 | learning rate: 1.287E-04 | global batch size: 256 | lm loss: 2.323820E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.675 | TFLOPs: 31.15 | 7: iteration 50610/ 115203 | consumed samples: 12956160 | consumed tokens: 26534215680 | elapsed time per iteration (s): 0.44 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.292453E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.604 | TFLOPs: 30.83 | 7: iteration 50620/ 115203 | consumed samples: 12958720 | consumed tokens: 26539458560 | elapsed time per iteration (s): 0.43 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.305146E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.323 | TFLOPs: 31.45 | 7: iteration 50630/ 115203 | consumed samples: 12961280 | consumed tokens: 26544701440 | elapsed time per iteration (s): 0.44 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.305958E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.395 | TFLOPs: 30.66 | 7: iteration 50640/ 115203 | consumed samples: 12963840 | consumed tokens: 26549944320 | elapsed time per iteration (s): 0.43 | learning rate: 1.286E-04 | global batch size: 256 | lm loss: 2.319804E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.973 | TFLOPs: 31.58 | 7: iteration 50650/ 115203 | consumed samples: 12966400 | consumed tokens: 26555187200 | elapsed time per iteration (s): 0.43 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.309167E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.642 | TFLOPs: 31.36 | 7: iteration 50660/ 115203 | consumed samples: 12968960 | consumed tokens: 26560430080 | elapsed time per iteration (s): 0.43 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.289947E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.728 | TFLOPs: 31.47 | 7: iteration 50670/ 115203 | consumed samples: 12971520 | consumed tokens: 26565672960 | elapsed time per iteration (s): 0.43 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.299041E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.896 | TFLOPs: 31.37 | 7: iteration 50680/ 115203 | consumed samples: 12974080 | consumed tokens: 26570915840 | elapsed time per iteration (s): 0.43 | learning rate: 1.285E-04 | global batch size: 256 | lm loss: 2.297462E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.078 | TFLOPs: 31.54 | 7: iteration 50690/ 115203 | consumed samples: 12976640 | consumed tokens: 26576158720 | elapsed time per iteration (s): 0.42 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.274516E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.899 | TFLOPs: 31.95 | 7: iteration 50700/ 115203 | consumed samples: 12979200 | consumed tokens: 26581401600 | elapsed time per iteration (s): 0.44 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.287197E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.330 | TFLOPs: 30.76 | 7: iteration 50710/ 115203 | consumed samples: 12981760 | consumed tokens: 26586644480 | elapsed time per iteration (s): 0.44 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.294952E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.168 | TFLOPs: 30.34 | 7: iteration 50720/ 115203 | consumed samples: 12984320 | consumed tokens: 26591887360 | elapsed time per iteration (s): 0.44 | learning rate: 1.284E-04 | global batch size: 256 | lm loss: 2.309291E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.822 | TFLOPs: 30.84 | 7: iteration 50730/ 115203 | consumed samples: 12986880 | consumed tokens: 26597130240 | elapsed time per iteration (s): 0.46 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.285157E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.492 | TFLOPs: 29.09 | 7: iteration 50740/ 115203 | consumed samples: 12989440 | consumed tokens: 26602373120 | elapsed time per iteration (s): 0.43 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.302277E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.656 | TFLOPs: 30.94 | 7: iteration 50750/ 115203 | consumed samples: 12992000 | consumed tokens: 26607616000 | elapsed time per iteration (s): 0.42 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.283774E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.845 | TFLOPs: 31.63 | 7: iteration 50760/ 115203 | consumed samples: 12994560 | consumed tokens: 26612858880 | elapsed time per iteration (s): 0.42 | learning rate: 1.283E-04 | global batch size: 256 | lm loss: 2.320886E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.912 | TFLOPs: 31.74 | 7: iteration 50770/ 115203 | consumed samples: 12997120 | consumed tokens: 26618101760 | elapsed time per iteration (s): 0.43 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.325633E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.115 | TFLOPs: 31.49 | 7: iteration 50780/ 115203 | consumed samples: 12999680 | consumed tokens: 26623344640 | elapsed time per iteration (s): 0.42 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.293888E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.990 | TFLOPs: 31.74 | 7: iteration 50790/ 115203 | consumed samples: 13002240 | consumed tokens: 26628587520 | elapsed time per iteration (s): 0.43 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.303449E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.308 | TFLOPs: 31.44 | 7: iteration 50800/ 115203 | consumed samples: 13004800 | consumed tokens: 26633830400 | elapsed time per iteration (s): 0.42 | learning rate: 1.282E-04 | global batch size: 256 | lm loss: 2.308081E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.077 | TFLOPs: 31.80 | 7: iteration 50810/ 115203 | consumed samples: 13007360 | consumed tokens: 26639073280 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.316764E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.307 | TFLOPs: 31.39 | 7: iteration 50820/ 115203 | consumed samples: 13009920 | consumed tokens: 26644316160 | elapsed time per iteration (s): 0.42 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.296180E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.494 | TFLOPs: 31.82 | 7: iteration 50830/ 115203 | consumed samples: 13012480 | consumed tokens: 26649559040 | elapsed time per iteration (s): 0.43 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.309340E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.803 | TFLOPs: 31.47 | 7: iteration 50840/ 115203 | consumed samples: 13015040 | consumed tokens: 26654801920 | elapsed time per iteration (s): 0.42 | learning rate: 1.281E-04 | global batch size: 256 | lm loss: 2.312196E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.098 | TFLOPs: 31.70 | 7: iteration 50850/ 115203 | consumed samples: 13017600 | consumed tokens: 26660044800 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.301967E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.447 | TFLOPs: 31.29 | 7: iteration 50860/ 115203 | consumed samples: 13020160 | consumed tokens: 26665287680 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.310159E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.831 | TFLOPs: 31.58 | 7: iteration 50870/ 115203 | consumed samples: 13022720 | consumed tokens: 26670530560 | elapsed time per iteration (s): 0.44 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.301923E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.547 | TFLOPs: 30.30 | 7: iteration 50880/ 115203 | consumed samples: 13025280 | consumed tokens: 26675773440 | elapsed time per iteration (s): 0.43 | learning rate: 1.280E-04 | global batch size: 256 | lm loss: 2.298497E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.402 | TFLOPs: 31.03 | 7: iteration 50890/ 115203 | consumed samples: 13027840 | consumed tokens: 26681016320 | elapsed time per iteration (s): 0.44 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.269582E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.681 | TFLOPs: 30.83 | 7: iteration 50900/ 115203 | consumed samples: 13030400 | consumed tokens: 26686259200 | elapsed time per iteration (s): 0.42 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.264590E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.005 | TFLOPs: 31.74 | 7: iteration 50910/ 115203 | consumed samples: 13032960 | consumed tokens: 26691502080 | elapsed time per iteration (s): 0.43 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.291247E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.119 | TFLOPs: 31.49 | 7: iteration 50920/ 115203 | consumed samples: 13035520 | consumed tokens: 26696744960 | elapsed time per iteration (s): 0.43 | learning rate: 1.279E-04 | global batch size: 256 | lm loss: 2.297348E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.047 | TFLOPs: 31.22 | 7: iteration 50930/ 115203 | consumed samples: 13038080 | consumed tokens: 26701987840 | elapsed time per iteration (s): 0.43 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.319006E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.192 | TFLOPs: 31.33 | 7: iteration 50940/ 115203 | consumed samples: 13040640 | consumed tokens: 26707230720 | elapsed time per iteration (s): 0.42 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.340618E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.253 | TFLOPs: 31.65 | 7: iteration 50950/ 115203 | consumed samples: 13043200 | consumed tokens: 26712473600 | elapsed time per iteration (s): 0.44 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.304295E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.492 | TFLOPs: 30.30 | 7: iteration 50960/ 115203 | consumed samples: 13045760 | consumed tokens: 26717716480 | elapsed time per iteration (s): 0.43 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.292438E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.705 | TFLOPs: 31.36 | 7: iteration 50970/ 115203 | consumed samples: 13048320 | consumed tokens: 26722959360 | elapsed time per iteration (s): 0.43 | learning rate: 1.278E-04 | global batch size: 256 | lm loss: 2.328935E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.527 | TFLOPs: 31.19 | 7: iteration 50980/ 115203 | consumed samples: 13050880 | consumed tokens: 26728202240 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.325348E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.313 | TFLOPs: 31.81 | 7: iteration 50990/ 115203 | consumed samples: 13053440 | consumed tokens: 26733445120 | elapsed time per iteration (s): 0.42 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.275532E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.442 | TFLOPs: 31.66 | 7: iteration 51000/ 115203 | consumed samples: 13056000 | consumed tokens: 26738688000 | elapsed time per iteration (s): 0.43 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.289977E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.289 | TFLOPs: 31.08 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 51000 | lm loss value: 2.167996E+00 | lm loss PPL: 8.740747E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 51000 to checkpoints_221m 0: [2022-11-28 19:05:07,348] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step51000 is begin to save! 0: [2022-11-28 19:05:07,361] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:05:07,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:05:07,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:05:07,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:05:07,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:05:07,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:05:07,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:05:07,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:05:07,541] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:05:07,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:05:07,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:05:07,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:05:07,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:05:07,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:05:07,610] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:05:07,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:05:07,633] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:05:07,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:05:07,656] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:05:07,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:05:07,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:05:07,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:05:07,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:05:07,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:05:07,728] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:05:07,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:05:07,752] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:05:07,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:05:07,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:05:07,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:05:07,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:05:07,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:05:07,824] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:05:07,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:05:07,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:05:07,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:05:07,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:05:07,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:05:07,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:05:07,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:05:07,903] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step51000/mp_rank_00_model_states.pt 0: [2022-11-28 19:05:07,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:05:07,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:05:07,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:05:07,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,970] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,970] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,971] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,971] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:05:07,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 6: [2022-11-28 19:05:07,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:05:07,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:05:07,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:05:07,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 2: [2022-11-28 19:05:07,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 1: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,990] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,990] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,990] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:05:07,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:05:07,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,986] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 3: [2022-11-28 19:05:07,987] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:05:07,987] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 19:05:07,987] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:05:07,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:07,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:07,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 5: [2022-11-28 19:05:08,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: [2022-11-28 19:05:08,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:05:08,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:05:08,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 4: [2022-11-28 19:05:08,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step51000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:05:08,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step51000 is ready now! 0: successfully saved checkpoint at iteration 51000 to checkpoints_221m 7: time (ms) | save-checkpoint: 848.41 7: iteration 51010/ 115203 | consumed samples: 13058560 | consumed tokens: 26743930880 | elapsed time per iteration (s): 0.53 | learning rate: 1.277E-04 | global batch size: 256 | lm loss: 2.297359E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.177 | TFLOPs: 25.35 | 7: iteration 51020/ 115203 | consumed samples: 13061120 | consumed tokens: 26749173760 | elapsed time per iteration (s): 0.42 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.337912E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.424 | TFLOPs: 31.71 | 7: iteration 51030/ 115203 | consumed samples: 13063680 | consumed tokens: 26754416640 | elapsed time per iteration (s): 0.43 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.277718E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.211 | TFLOPs: 31.18 | 7: iteration 51040/ 115203 | consumed samples: 13066240 | consumed tokens: 26759659520 | elapsed time per iteration (s): 0.43 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.313381E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.568 | TFLOPs: 31.09 | 7: iteration 51050/ 115203 | consumed samples: 13068800 | consumed tokens: 26764902400 | elapsed time per iteration (s): 0.45 | learning rate: 1.276E-04 | global batch size: 256 | lm loss: 2.305285E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.235 | TFLOPs: 30.18 | 7: iteration 51060/ 115203 | consumed samples: 13071360 | consumed tokens: 26770145280 | elapsed time per iteration (s): 0.46 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.270813E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.415 | TFLOPs: 29.40 | 7: iteration 51070/ 115203 | consumed samples: 13073920 | consumed tokens: 26775388160 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.306486E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.463 | TFLOPs: 31.51 | 7: iteration 51080/ 115203 | consumed samples: 13076480 | consumed tokens: 26780631040 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.304015E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.654 | TFLOPs: 31.25 | 7: iteration 51090/ 115203 | consumed samples: 13079040 | consumed tokens: 26785873920 | elapsed time per iteration (s): 0.43 | learning rate: 1.275E-04 | global batch size: 256 | lm loss: 2.307826E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.529 | TFLOPs: 30.98 | 7: iteration 51100/ 115203 | consumed samples: 13081600 | consumed tokens: 26791116800 | elapsed time per iteration (s): 0.43 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.276586E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.314 | TFLOPs: 31.39 | 7: iteration 51110/ 115203 | consumed samples: 13084160 | consumed tokens: 26796359680 | elapsed time per iteration (s): 0.43 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.310770E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.807 | TFLOPs: 31.21 | 7: iteration 51120/ 115203 | consumed samples: 13086720 | consumed tokens: 26801602560 | elapsed time per iteration (s): 0.43 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.299015E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.526 | TFLOPs: 31.30 | 7: iteration 51130/ 115203 | consumed samples: 13089280 | consumed tokens: 26806845440 | elapsed time per iteration (s): 0.44 | learning rate: 1.274E-04 | global batch size: 256 | lm loss: 2.302665E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.948 | TFLOPs: 30.64 | 7: iteration 51140/ 115203 | consumed samples: 13091840 | consumed tokens: 26812088320 | elapsed time per iteration (s): 0.45 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.305904E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.188 | TFLOPs: 29.97 | 7: iteration 51150/ 115203 | consumed samples: 13094400 | consumed tokens: 26817331200 | elapsed time per iteration (s): 0.44 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.344307E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.650 | TFLOPs: 30.68 | 7: iteration 51160/ 115203 | consumed samples: 13096960 | consumed tokens: 26822574080 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.331885E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.483 | TFLOPs: 31.72 | 7: iteration 51170/ 115203 | consumed samples: 13099520 | consumed tokens: 26827816960 | elapsed time per iteration (s): 0.42 | learning rate: 1.273E-04 | global batch size: 256 | lm loss: 2.278157E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.099 | TFLOPs: 31.70 | 7: iteration 51180/ 115203 | consumed samples: 13102080 | consumed tokens: 26833059840 | elapsed time per iteration (s): 0.43 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.289949E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.151 | TFLOPs: 31.59 | 7: iteration 51190/ 115203 | consumed samples: 13104640 | consumed tokens: 26838302720 | elapsed time per iteration (s): 0.42 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.309291E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.538 | TFLOPs: 31.72 | 7: iteration 51200/ 115203 | consumed samples: 13107200 | consumed tokens: 26843545600 | elapsed time per iteration (s): 0.43 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.282257E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.970 | TFLOPs: 31.37 | 7: iteration 51210/ 115203 | consumed samples: 13109760 | consumed tokens: 26848788480 | elapsed time per iteration (s): 0.43 | learning rate: 1.272E-04 | global batch size: 256 | lm loss: 2.292798E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.079 | TFLOPs: 31.01 | 7: iteration 51220/ 115203 | consumed samples: 13112320 | consumed tokens: 26854031360 | elapsed time per iteration (s): 0.60 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.309249E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 428.736 | TFLOPs: 22.50 | 7: iteration 51230/ 115203 | consumed samples: 13114880 | consumed tokens: 26859274240 | elapsed time per iteration (s): 0.42 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.262563E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.011 | TFLOPs: 31.74 | 7: iteration 51240/ 115203 | consumed samples: 13117440 | consumed tokens: 26864517120 | elapsed time per iteration (s): 0.56 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.304794E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 458.209 | TFLOPs: 24.04 | 7: iteration 51250/ 115203 | consumed samples: 13120000 | consumed tokens: 26869760000 | elapsed time per iteration (s): 0.43 | learning rate: 1.271E-04 | global batch size: 256 | lm loss: 2.290555E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.785 | TFLOPs: 31.26 | 7: iteration 51260/ 115203 | consumed samples: 13122560 | consumed tokens: 26875002880 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.290665E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.532 | TFLOPs: 31.46 | 7: iteration 51270/ 115203 | consumed samples: 13125120 | consumed tokens: 26880245760 | elapsed time per iteration (s): 0.42 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.272191E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.609 | TFLOPs: 31.72 | 7: iteration 51280/ 115203 | consumed samples: 13127680 | consumed tokens: 26885488640 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.292891E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.806 | TFLOPs: 31.21 | 7: iteration 51290/ 115203 | consumed samples: 13130240 | consumed tokens: 26890731520 | elapsed time per iteration (s): 0.43 | learning rate: 1.270E-04 | global batch size: 256 | lm loss: 2.295254E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.797 | TFLOPs: 31.00 | 7: iteration 51300/ 115203 | consumed samples: 13132800 | consumed tokens: 26895974400 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.338574E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.634 | TFLOPs: 30.99 | 7: iteration 51310/ 115203 | consumed samples: 13135360 | consumed tokens: 26901217280 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.299668E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.370 | TFLOPs: 31.55 | 7: iteration 51320/ 115203 | consumed samples: 13137920 | consumed tokens: 26906460160 | elapsed time per iteration (s): 0.42 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.293214E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.426 | TFLOPs: 31.87 | 7: iteration 51330/ 115203 | consumed samples: 13140480 | consumed tokens: 26911703040 | elapsed time per iteration (s): 0.44 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.336819E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.496 | TFLOPs: 30.41 | 7: iteration 51340/ 115203 | consumed samples: 13143040 | consumed tokens: 26916945920 | elapsed time per iteration (s): 0.43 | learning rate: 1.269E-04 | global batch size: 256 | lm loss: 2.311117E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.851 | TFLOPs: 31.00 | 7: iteration 51350/ 115203 | consumed samples: 13145600 | consumed tokens: 26922188800 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.270092E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.730 | TFLOPs: 31.10 | 7: iteration 51360/ 115203 | consumed samples: 13148160 | consumed tokens: 26927431680 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.317683E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.793 | TFLOPs: 31.42 | 7: iteration 51370/ 115203 | consumed samples: 13150720 | consumed tokens: 26932674560 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.302291E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.659 | TFLOPs: 31.15 | 7: iteration 51380/ 115203 | consumed samples: 13153280 | consumed tokens: 26937917440 | elapsed time per iteration (s): 0.43 | learning rate: 1.268E-04 | global batch size: 256 | lm loss: 2.284945E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.605 | TFLOPs: 31.25 | 7: iteration 51390/ 115203 | consumed samples: 13155840 | consumed tokens: 26943160320 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.320235E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.405 | TFLOPs: 31.24 | 7: iteration 51400/ 115203 | consumed samples: 13158400 | consumed tokens: 26948403200 | elapsed time per iteration (s): 0.44 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.282427E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.743 | TFLOPs: 30.26 | 7: iteration 51410/ 115203 | consumed samples: 13160960 | consumed tokens: 26953646080 | elapsed time per iteration (s): 0.43 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.292024E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.595 | TFLOPs: 31.41 | 7: iteration 51420/ 115203 | consumed samples: 13163520 | consumed tokens: 26958888960 | elapsed time per iteration (s): 0.44 | learning rate: 1.267E-04 | global batch size: 256 | lm loss: 2.301350E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.159 | TFLOPs: 30.28 | 7: iteration 51430/ 115203 | consumed samples: 13166080 | consumed tokens: 26964131840 | elapsed time per iteration (s): 0.43 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.315585E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.131 | TFLOPs: 31.23 | 7: iteration 51440/ 115203 | consumed samples: 13168640 | consumed tokens: 26969374720 | elapsed time per iteration (s): 0.43 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.292485E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.861 | TFLOPs: 31.26 | 7: iteration 51450/ 115203 | consumed samples: 13171200 | consumed tokens: 26974617600 | elapsed time per iteration (s): 0.42 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.304379E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 616.307 | TFLOPs: 32.34 | 7: iteration 51460/ 115203 | consumed samples: 13173760 | consumed tokens: 26979860480 | elapsed time per iteration (s): 0.44 | learning rate: 1.266E-04 | global batch size: 256 | lm loss: 2.259212E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.116 | TFLOPs: 30.54 | 7: iteration 51470/ 115203 | consumed samples: 13176320 | consumed tokens: 26985103360 | elapsed time per iteration (s): 0.43 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.298570E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.259 | TFLOPs: 31.60 | 7: iteration 51480/ 115203 | consumed samples: 13178880 | consumed tokens: 26990346240 | elapsed time per iteration (s): 0.42 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.282387E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.781 | TFLOPs: 31.63 | 7: iteration 51490/ 115203 | consumed samples: 13181440 | consumed tokens: 26995589120 | elapsed time per iteration (s): 0.43 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.274626E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.315 | TFLOPs: 31.08 | 7: iteration 51500/ 115203 | consumed samples: 13184000 | consumed tokens: 27000832000 | elapsed time per iteration (s): 0.43 | learning rate: 1.265E-04 | global batch size: 256 | lm loss: 2.298229E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.864 | TFLOPs: 31.58 | 7: iteration 51510/ 115203 | consumed samples: 13186560 | consumed tokens: 27006074880 | elapsed time per iteration (s): 0.44 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.280732E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.300 | TFLOPs: 30.71 | 7: iteration 51520/ 115203 | consumed samples: 13189120 | consumed tokens: 27011317760 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.283997E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.906 | TFLOPs: 31.90 | 7: iteration 51530/ 115203 | consumed samples: 13191680 | consumed tokens: 27016560640 | elapsed time per iteration (s): 0.42 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.285820E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.613 | TFLOPs: 32.30 | 7: iteration 51540/ 115203 | consumed samples: 13194240 | consumed tokens: 27021803520 | elapsed time per iteration (s): 0.44 | learning rate: 1.264E-04 | global batch size: 256 | lm loss: 2.305456E+00 | grad norm: 0.192 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.064 | TFLOPs: 30.49 | 7: iteration 51550/ 115203 | consumed samples: 13196800 | consumed tokens: 27027046400 | elapsed time per iteration (s): 0.43 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.297016E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.807 | TFLOPs: 31.58 | 7: iteration 51560/ 115203 | consumed samples: 13199360 | consumed tokens: 27032289280 | elapsed time per iteration (s): 0.43 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.319557E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.880 | TFLOPs: 31.58 | 7: iteration 51570/ 115203 | consumed samples: 13201920 | consumed tokens: 27037532160 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.307412E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.117 | TFLOPs: 31.91 | 7: iteration 51580/ 115203 | consumed samples: 13204480 | consumed tokens: 27042775040 | elapsed time per iteration (s): 0.42 | learning rate: 1.263E-04 | global batch size: 256 | lm loss: 2.290686E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.333 | TFLOPs: 31.81 | 7: iteration 51590/ 115203 | consumed samples: 13207040 | consumed tokens: 27048017920 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.276659E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.404 | TFLOPs: 31.71 | 7: iteration 51600/ 115203 | consumed samples: 13209600 | consumed tokens: 27053260800 | elapsed time per iteration (s): 0.43 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.280777E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.384 | TFLOPs: 31.50 | 7: iteration 51610/ 115203 | consumed samples: 13212160 | consumed tokens: 27058503680 | elapsed time per iteration (s): 0.43 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.317814E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.908 | TFLOPs: 31.32 | 7: iteration 51620/ 115203 | consumed samples: 13214720 | consumed tokens: 27063746560 | elapsed time per iteration (s): 0.42 | learning rate: 1.262E-04 | global batch size: 256 | lm loss: 2.304547E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.464 | TFLOPs: 31.61 | 7: iteration 51630/ 115203 | consumed samples: 13217280 | consumed tokens: 27068989440 | elapsed time per iteration (s): 0.43 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.291571E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.819 | TFLOPs: 31.21 | 7: iteration 51640/ 115203 | consumed samples: 13219840 | consumed tokens: 27074232320 | elapsed time per iteration (s): 0.43 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.291267E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.731 | TFLOPs: 31.47 | 7: iteration 51650/ 115203 | consumed samples: 13222400 | consumed tokens: 27079475200 | elapsed time per iteration (s): 0.43 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.319052E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.116 | TFLOPs: 31.12 | 7: iteration 51660/ 115203 | consumed samples: 13224960 | consumed tokens: 27084718080 | elapsed time per iteration (s): 0.42 | learning rate: 1.261E-04 | global batch size: 256 | lm loss: 2.282598E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.055 | TFLOPs: 31.96 | 7: iteration 51670/ 115203 | consumed samples: 13227520 | consumed tokens: 27089960960 | elapsed time per iteration (s): 0.43 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.288832E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.051 | TFLOPs: 31.33 | 7: iteration 51680/ 115203 | consumed samples: 13230080 | consumed tokens: 27095203840 | elapsed time per iteration (s): 0.42 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.251295E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.178 | TFLOPs: 31.65 | 7: iteration 51690/ 115203 | consumed samples: 13232640 | consumed tokens: 27100446720 | elapsed time per iteration (s): 0.44 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.309624E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.014 | TFLOPs: 30.64 | 7: iteration 51700/ 115203 | consumed samples: 13235200 | consumed tokens: 27105689600 | elapsed time per iteration (s): 0.43 | learning rate: 1.260E-04 | global batch size: 256 | lm loss: 2.271147E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.881 | TFLOPs: 31.37 | 7: iteration 51710/ 115203 | consumed samples: 13237760 | consumed tokens: 27110932480 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.281397E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.524 | TFLOPs: 31.67 | 7: iteration 51720/ 115203 | consumed samples: 13240320 | consumed tokens: 27116175360 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.297266E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.176 | TFLOPs: 31.86 | 7: iteration 51730/ 115203 | consumed samples: 13242880 | consumed tokens: 27121418240 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.297973E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.109 | TFLOPs: 31.91 | 7: iteration 51740/ 115203 | consumed samples: 13245440 | consumed tokens: 27126661120 | elapsed time per iteration (s): 0.42 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.267603E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.106 | TFLOPs: 31.64 | 7: iteration 51750/ 115203 | consumed samples: 13248000 | consumed tokens: 27131904000 | elapsed time per iteration (s): 0.45 | learning rate: 1.259E-04 | global batch size: 256 | lm loss: 2.276163E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.641 | TFLOPs: 30.15 | 7: iteration 51760/ 115203 | consumed samples: 13250560 | consumed tokens: 27137146880 | elapsed time per iteration (s): 0.44 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.281494E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.530 | TFLOPs: 30.46 | 7: iteration 51770/ 115203 | consumed samples: 13253120 | consumed tokens: 27142389760 | elapsed time per iteration (s): 0.43 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.309904E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.190 | TFLOPs: 31.39 | 7: iteration 51780/ 115203 | consumed samples: 13255680 | consumed tokens: 27147632640 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.309238E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.617 | TFLOPs: 31.78 | 7: iteration 51790/ 115203 | consumed samples: 13258240 | consumed tokens: 27152875520 | elapsed time per iteration (s): 0.42 | learning rate: 1.258E-04 | global batch size: 256 | lm loss: 2.303164E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.932 | TFLOPs: 31.84 | 7: iteration 51800/ 115203 | consumed samples: 13260800 | consumed tokens: 27158118400 | elapsed time per iteration (s): 0.43 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.289564E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.287 | TFLOPs: 31.34 | 7: iteration 51810/ 115203 | consumed samples: 13263360 | consumed tokens: 27163361280 | elapsed time per iteration (s): 0.43 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.302416E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.752 | TFLOPs: 31.42 | 7: iteration 51820/ 115203 | consumed samples: 13265920 | consumed tokens: 27168604160 | elapsed time per iteration (s): 0.43 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.313620E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.583 | TFLOPs: 30.88 | 7: iteration 51830/ 115203 | consumed samples: 13268480 | consumed tokens: 27173847040 | elapsed time per iteration (s): 0.43 | learning rate: 1.257E-04 | global batch size: 256 | lm loss: 2.287345E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.838 | TFLOPs: 31.47 | 7: iteration 51840/ 115203 | consumed samples: 13271040 | consumed tokens: 27179089920 | elapsed time per iteration (s): 0.43 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.293033E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.729 | TFLOPs: 30.99 | 7: iteration 51850/ 115203 | consumed samples: 13273600 | consumed tokens: 27184332800 | elapsed time per iteration (s): 0.43 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.281626E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.851 | TFLOPs: 31.16 | 7: iteration 51860/ 115203 | consumed samples: 13276160 | consumed tokens: 27189575680 | elapsed time per iteration (s): 0.44 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.278732E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.278 | TFLOPs: 30.87 | 7: iteration 51870/ 115203 | consumed samples: 13278720 | consumed tokens: 27194818560 | elapsed time per iteration (s): 0.43 | learning rate: 1.256E-04 | global batch size: 256 | lm loss: 2.301918E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.938 | TFLOPs: 31.58 | 7: iteration 51880/ 115203 | consumed samples: 13281280 | consumed tokens: 27200061440 | elapsed time per iteration (s): 0.42 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.278452E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.500 | TFLOPs: 32.24 | 7: iteration 51890/ 115203 | consumed samples: 13283840 | consumed tokens: 27205304320 | elapsed time per iteration (s): 0.43 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.288389E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.137 | TFLOPs: 31.02 | 7: iteration 51900/ 115203 | consumed samples: 13286400 | consumed tokens: 27210547200 | elapsed time per iteration (s): 0.43 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.280604E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.021 | TFLOPs: 30.90 | 7: iteration 51910/ 115203 | consumed samples: 13288960 | consumed tokens: 27215790080 | elapsed time per iteration (s): 0.43 | learning rate: 1.255E-04 | global batch size: 256 | lm loss: 2.276108E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.762 | TFLOPs: 31.36 | 7: iteration 51920/ 115203 | consumed samples: 13291520 | consumed tokens: 27221032960 | elapsed time per iteration (s): 0.43 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.274012E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.133 | TFLOPs: 31.33 | 7: iteration 51930/ 115203 | consumed samples: 13294080 | consumed tokens: 27226275840 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.275828E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.884 | TFLOPs: 31.68 | 7: iteration 51940/ 115203 | consumed samples: 13296640 | consumed tokens: 27231518720 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.292231E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.854 | TFLOPs: 31.89 | 7: iteration 51950/ 115203 | consumed samples: 13299200 | consumed tokens: 27236761600 | elapsed time per iteration (s): 0.42 | learning rate: 1.254E-04 | global batch size: 256 | lm loss: 2.293401E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.954 | TFLOPs: 32.11 | 7: iteration 51960/ 115203 | consumed samples: 13301760 | consumed tokens: 27242004480 | elapsed time per iteration (s): 0.42 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.297952E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.282 | TFLOPs: 31.65 | 7: iteration 51970/ 115203 | consumed samples: 13304320 | consumed tokens: 27247247360 | elapsed time per iteration (s): 0.43 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.299155E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.083 | TFLOPs: 31.54 | 7: iteration 51980/ 115203 | consumed samples: 13306880 | consumed tokens: 27252490240 | elapsed time per iteration (s): 0.42 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.279384E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.708 | TFLOPs: 31.89 | 7: iteration 51990/ 115203 | consumed samples: 13309440 | consumed tokens: 27257733120 | elapsed time per iteration (s): 0.44 | learning rate: 1.253E-04 | global batch size: 256 | lm loss: 2.295034E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.824 | TFLOPs: 30.79 | 0: [2022-11-28 19:12:20,398] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=0, lr=[0.00012524180298737348, 0.00012524180298737348, 0.00012524180298737348], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 52000/ 115203 | consumed samples: 13312000 | consumed tokens: 27262976000 | elapsed time per iteration (s): 0.43 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.302899E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.549 | TFLOPs: 30.99 | 0: steps: 52000 loss: 2.2118 iter time (s): 0.431 samples/sec: 594.405 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 52000 | lm loss value: 2.236824E+00 | lm loss PPL: 9.363546E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 52000 to checkpoints_221m 0: [2022-11-28 19:12:20,570] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step52000 is begin to save! 0: [2022-11-28 19:12:20,594] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:12:20,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:12:20,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:12:20,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:12:20,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:12:20,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:12:20,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:12:20,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:12:20,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:12:20,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:12:20,783] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:12:20,807] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:12:20,807] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:12:20,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:12:20,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:12:20,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:12:20,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:12:20,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:12:20,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:12:21,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:12:21,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:12:21,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:12:21,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:12:21,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:12:21,047] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:12:21,070] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:12:21,071] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:12:21,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:12:21,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:12:21,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:12:21,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:12:21,137] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:12:21,137] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:12:21,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:12:21,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:12:21,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:12:21,185] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:12:21,209] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:12:21,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:12:21,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:12:21,214] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step52000/mp_rank_00_model_states.pt 0: [2022-11-28 19:12:21,214] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:12:21,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:12:21,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:12:21,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,288] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,288] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,290] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,292] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,292] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,293] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,293] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,295] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,295] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,297] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 6: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:12:21,299] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:12:21,299] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 1: [2022-11-28 19:12:21,304] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:12:21,304] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:12:21,304] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 4: [2022-11-28 19:12:21,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:12:21,305] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:12:21,305] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,290] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,285] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,285] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,291] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,291] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,289] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,289] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,298] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,294] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,298] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,294] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,300] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,300] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,307] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,307] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,300] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:12:21,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 5: [2022-11-28 19:12:21,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,309] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:12:21,301] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 19:12:21,309] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 19:12:21,301] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 3: [2022-11-28 19:12:21,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:12:21,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 2: [2022-11-28 19:12:21,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: [2022-11-28 19:12:21,334] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:12:21,334] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 7: [2022-11-28 19:12:21,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:12:21,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step52000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:12:21,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step52000 is ready now! 0: successfully saved checkpoint at iteration 52000 to checkpoints_221m 7: time (ms) | save-checkpoint: 965.33 7: iteration 52010/ 115203 | consumed samples: 13314560 | consumed tokens: 27268218880 | elapsed time per iteration (s): 0.54 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.272058E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 473.264 | TFLOPs: 24.83 | 7: iteration 52020/ 115203 | consumed samples: 13317120 | consumed tokens: 27273461760 | elapsed time per iteration (s): 0.43 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.292963E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.403 | TFLOPs: 31.40 | 7: iteration 52030/ 115203 | consumed samples: 13319680 | consumed tokens: 27278704640 | elapsed time per iteration (s): 0.42 | learning rate: 1.252E-04 | global batch size: 256 | lm loss: 2.268012E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.997 | TFLOPs: 31.90 | 7: iteration 52040/ 115203 | consumed samples: 13322240 | consumed tokens: 27283947520 | elapsed time per iteration (s): 0.43 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.266630E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.322 | TFLOPs: 31.45 | 7: iteration 52050/ 115203 | consumed samples: 13324800 | consumed tokens: 27289190400 | elapsed time per iteration (s): 0.43 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.270958E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.116 | TFLOPs: 31.28 | 7: iteration 52060/ 115203 | consumed samples: 13327360 | consumed tokens: 27294433280 | elapsed time per iteration (s): 0.43 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.285969E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.911 | TFLOPs: 31.16 | 7: iteration 52070/ 115203 | consumed samples: 13329920 | consumed tokens: 27299676160 | elapsed time per iteration (s): 0.43 | learning rate: 1.251E-04 | global batch size: 256 | lm loss: 2.317689E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.961 | TFLOPs: 31.37 | 7: iteration 52080/ 115203 | consumed samples: 13332480 | consumed tokens: 27304919040 | elapsed time per iteration (s): 0.43 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.263363E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.641 | TFLOPs: 31.41 | 7: iteration 52090/ 115203 | consumed samples: 13335040 | consumed tokens: 27310161920 | elapsed time per iteration (s): 0.43 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.281843E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.375 | TFLOPs: 31.03 | 7: iteration 52100/ 115203 | consumed samples: 13337600 | consumed tokens: 27315404800 | elapsed time per iteration (s): 0.43 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.261960E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.634 | TFLOPs: 31.20 | 7: iteration 52110/ 115203 | consumed samples: 13340160 | consumed tokens: 27320647680 | elapsed time per iteration (s): 0.42 | learning rate: 1.250E-04 | global batch size: 256 | lm loss: 2.325723E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.808 | TFLOPs: 31.89 | 7: iteration 52120/ 115203 | consumed samples: 13342720 | consumed tokens: 27325890560 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.286079E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.671 | TFLOPs: 31.99 | 7: iteration 52130/ 115203 | consumed samples: 13345280 | consumed tokens: 27331133440 | elapsed time per iteration (s): 0.42 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.306894E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.455 | TFLOPs: 31.61 | 7: iteration 52140/ 115203 | consumed samples: 13347840 | consumed tokens: 27336376320 | elapsed time per iteration (s): 0.44 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.307772E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.217 | TFLOPs: 30.44 | 7: iteration 52150/ 115203 | consumed samples: 13350400 | consumed tokens: 27341619200 | elapsed time per iteration (s): 0.43 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.325804E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.249 | TFLOPs: 31.49 | 7: iteration 52160/ 115203 | consumed samples: 13352960 | consumed tokens: 27346862080 | elapsed time per iteration (s): 0.43 | learning rate: 1.249E-04 | global batch size: 256 | lm loss: 2.302372E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.016 | TFLOPs: 31.53 | 7: iteration 52170/ 115203 | consumed samples: 13355520 | consumed tokens: 27352104960 | elapsed time per iteration (s): 0.43 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.315151E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.683 | TFLOPs: 31.20 | 7: iteration 52180/ 115203 | consumed samples: 13358080 | consumed tokens: 27357347840 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.309932E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.792 | TFLOPs: 31.99 | 7: iteration 52190/ 115203 | consumed samples: 13360640 | consumed tokens: 27362590720 | elapsed time per iteration (s): 0.42 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.302383E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.827 | TFLOPs: 32.05 | 7: iteration 52200/ 115203 | consumed samples: 13363200 | consumed tokens: 27367833600 | elapsed time per iteration (s): 0.43 | learning rate: 1.248E-04 | global batch size: 256 | lm loss: 2.283779E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.149 | TFLOPs: 31.12 | 7: iteration 52210/ 115203 | consumed samples: 13365760 | consumed tokens: 27373076480 | elapsed time per iteration (s): 0.43 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.334421E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.624 | TFLOPs: 31.46 | 7: iteration 52220/ 115203 | consumed samples: 13368320 | consumed tokens: 27378319360 | elapsed time per iteration (s): 0.42 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.302621E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.281 | TFLOPs: 31.65 | 7: iteration 52230/ 115203 | consumed samples: 13370880 | consumed tokens: 27383562240 | elapsed time per iteration (s): 0.43 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.294266E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.565 | TFLOPs: 30.93 | 7: iteration 52240/ 115203 | consumed samples: 13373440 | consumed tokens: 27388805120 | elapsed time per iteration (s): 0.44 | learning rate: 1.247E-04 | global batch size: 256 | lm loss: 2.255367E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.917 | TFLOPs: 30.85 | 7: iteration 52250/ 115203 | consumed samples: 13376000 | consumed tokens: 27394048000 | elapsed time per iteration (s): 0.45 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.296260E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.116 | TFLOPs: 30.18 | 7: iteration 52260/ 115203 | consumed samples: 13378560 | consumed tokens: 27399290880 | elapsed time per iteration (s): 0.43 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.294090E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.214 | TFLOPs: 31.49 | 7: iteration 52270/ 115203 | consumed samples: 13381120 | consumed tokens: 27404533760 | elapsed time per iteration (s): 0.43 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.284268E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.597 | TFLOPs: 31.56 | 7: iteration 52280/ 115203 | consumed samples: 13383680 | consumed tokens: 27409776640 | elapsed time per iteration (s): 0.44 | learning rate: 1.246E-04 | global batch size: 256 | lm loss: 2.329178E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.895 | TFLOPs: 30.74 | 7: iteration 52290/ 115203 | consumed samples: 13386240 | consumed tokens: 27415019520 | elapsed time per iteration (s): 0.43 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.309505E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.755 | TFLOPs: 30.89 | 7: iteration 52300/ 115203 | consumed samples: 13388800 | consumed tokens: 27420262400 | elapsed time per iteration (s): 0.43 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.311641E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.769 | TFLOPs: 31.47 | 7: iteration 52310/ 115203 | consumed samples: 13391360 | consumed tokens: 27425505280 | elapsed time per iteration (s): 0.43 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.296697E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.805 | TFLOPs: 31.52 | 7: iteration 52320/ 115203 | consumed samples: 13393920 | consumed tokens: 27430748160 | elapsed time per iteration (s): 0.43 | learning rate: 1.245E-04 | global batch size: 256 | lm loss: 2.289380E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.342 | TFLOPs: 31.18 | 7: iteration 52330/ 115203 | consumed samples: 13396480 | consumed tokens: 27435991040 | elapsed time per iteration (s): 0.43 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.285152E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.152 | TFLOPs: 31.59 | 7: iteration 52340/ 115203 | consumed samples: 13399040 | consumed tokens: 27441233920 | elapsed time per iteration (s): 0.43 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.300281E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.557 | TFLOPs: 31.35 | 7: iteration 52350/ 115203 | consumed samples: 13401600 | consumed tokens: 27446476800 | elapsed time per iteration (s): 0.42 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.283086E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.297 | TFLOPs: 31.65 | 7: iteration 52360/ 115203 | consumed samples: 13404160 | consumed tokens: 27451719680 | elapsed time per iteration (s): 0.44 | learning rate: 1.244E-04 | global batch size: 256 | lm loss: 2.272399E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.530 | TFLOPs: 30.72 | 7: iteration 52370/ 115203 | consumed samples: 13406720 | consumed tokens: 27456962560 | elapsed time per iteration (s): 0.42 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.314992E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.277 | TFLOPs: 31.71 | 7: iteration 52380/ 115203 | consumed samples: 13409280 | consumed tokens: 27462205440 | elapsed time per iteration (s): 0.43 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.290145E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.626 | TFLOPs: 31.46 | 7: iteration 52390/ 115203 | consumed samples: 13411840 | consumed tokens: 27467448320 | elapsed time per iteration (s): 0.46 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.279909E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.885 | TFLOPs: 29.27 | 7: iteration 52400/ 115203 | consumed samples: 13414400 | consumed tokens: 27472691200 | elapsed time per iteration (s): 0.43 | learning rate: 1.243E-04 | global batch size: 256 | lm loss: 2.287858E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.837 | TFLOPs: 30.95 | 7: iteration 52410/ 115203 | consumed samples: 13416960 | consumed tokens: 27477934080 | elapsed time per iteration (s): 0.43 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.317365E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.279 | TFLOPs: 31.34 | 7: iteration 52420/ 115203 | consumed samples: 13419520 | consumed tokens: 27483176960 | elapsed time per iteration (s): 0.43 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.310883E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.139 | TFLOPs: 31.12 | 7: iteration 52430/ 115203 | consumed samples: 13422080 | consumed tokens: 27488419840 | elapsed time per iteration (s): 0.42 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.285382E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.357 | TFLOPs: 31.71 | 7: iteration 52440/ 115203 | consumed samples: 13424640 | consumed tokens: 27493662720 | elapsed time per iteration (s): 0.43 | learning rate: 1.242E-04 | global batch size: 256 | lm loss: 2.292328E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.274 | TFLOPs: 31.18 | 7: iteration 52450/ 115203 | consumed samples: 13427200 | consumed tokens: 27498905600 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.271998E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.860 | TFLOPs: 31.84 | 7: iteration 52460/ 115203 | consumed samples: 13429760 | consumed tokens: 27504148480 | elapsed time per iteration (s): 0.42 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.296187E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.714 | TFLOPs: 31.89 | 7: iteration 52470/ 115203 | consumed samples: 13432320 | consumed tokens: 27509391360 | elapsed time per iteration (s): 0.44 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.320598E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.243 | TFLOPs: 30.81 | 7: iteration 52480/ 115203 | consumed samples: 13434880 | consumed tokens: 27514634240 | elapsed time per iteration (s): 0.43 | learning rate: 1.241E-04 | global batch size: 256 | lm loss: 2.274366E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.673 | TFLOPs: 31.46 | 7: iteration 52490/ 115203 | consumed samples: 13437440 | consumed tokens: 27519877120 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.290675E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.844 | TFLOPs: 31.89 | 7: iteration 52500/ 115203 | consumed samples: 13440000 | consumed tokens: 27525120000 | elapsed time per iteration (s): 0.42 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.268261E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.321 | TFLOPs: 31.87 | 7: iteration 52510/ 115203 | consumed samples: 13442560 | consumed tokens: 27530362880 | elapsed time per iteration (s): 0.43 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.267561E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.146 | TFLOPs: 31.23 | 7: iteration 52520/ 115203 | consumed samples: 13445120 | consumed tokens: 27535605760 | elapsed time per iteration (s): 0.44 | learning rate: 1.240E-04 | global batch size: 256 | lm loss: 2.263939E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.168 | TFLOPs: 30.81 | 7: iteration 52530/ 115203 | consumed samples: 13447680 | consumed tokens: 27540848640 | elapsed time per iteration (s): 0.43 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.263504E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.058 | TFLOPs: 31.54 | 7: iteration 52540/ 115203 | consumed samples: 13450240 | consumed tokens: 27546091520 | elapsed time per iteration (s): 0.43 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.308905E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.032 | TFLOPs: 31.01 | 7: iteration 52550/ 115203 | consumed samples: 13452800 | consumed tokens: 27551334400 | elapsed time per iteration (s): 0.43 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.296906E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.698 | TFLOPs: 31.36 | 7: iteration 52560/ 115203 | consumed samples: 13455360 | consumed tokens: 27556577280 | elapsed time per iteration (s): 0.42 | learning rate: 1.239E-04 | global batch size: 256 | lm loss: 2.300716E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.448 | TFLOPs: 31.82 | 7: iteration 52570/ 115203 | consumed samples: 13457920 | consumed tokens: 27561820160 | elapsed time per iteration (s): 0.43 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.293742E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.272 | TFLOPs: 31.60 | 7: iteration 52580/ 115203 | consumed samples: 13460480 | consumed tokens: 27567063040 | elapsed time per iteration (s): 0.44 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.281363E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.622 | TFLOPs: 30.67 | 7: iteration 52590/ 115203 | consumed samples: 13463040 | consumed tokens: 27572305920 | elapsed time per iteration (s): 0.43 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.295368E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.996 | TFLOPs: 31.17 | 7: iteration 52600/ 115203 | consumed samples: 13465600 | consumed tokens: 27577548800 | elapsed time per iteration (s): 0.42 | learning rate: 1.238E-04 | global batch size: 256 | lm loss: 2.301017E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.103 | TFLOPs: 31.64 | 7: iteration 52610/ 115203 | consumed samples: 13468160 | consumed tokens: 27582791680 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.288738E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.956 | TFLOPs: 32.06 | 7: iteration 52620/ 115203 | consumed samples: 13470720 | consumed tokens: 27588034560 | elapsed time per iteration (s): 0.43 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.260729E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.067 | TFLOPs: 31.22 | 7: iteration 52630/ 115203 | consumed samples: 13473280 | consumed tokens: 27593277440 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.291047E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.767 | TFLOPs: 31.73 | 7: iteration 52640/ 115203 | consumed samples: 13475840 | consumed tokens: 27598520320 | elapsed time per iteration (s): 0.43 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.312832E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.041 | TFLOPs: 31.43 | 7: iteration 52650/ 115203 | consumed samples: 13478400 | consumed tokens: 27603763200 | elapsed time per iteration (s): 0.42 | learning rate: 1.237E-04 | global batch size: 256 | lm loss: 2.300772E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.031 | TFLOPs: 31.80 | 7: iteration 52660/ 115203 | consumed samples: 13480960 | consumed tokens: 27609006080 | elapsed time per iteration (s): 0.43 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.273098E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.056 | TFLOPs: 31.01 | 7: iteration 52670/ 115203 | consumed samples: 13483520 | consumed tokens: 27614248960 | elapsed time per iteration (s): 0.42 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.288872E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.239 | TFLOPs: 32.02 | 7: iteration 52680/ 115203 | consumed samples: 13486080 | consumed tokens: 27619491840 | elapsed time per iteration (s): 0.43 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.300126E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.750 | TFLOPs: 31.57 | 7: iteration 52690/ 115203 | consumed samples: 13488640 | consumed tokens: 27624734720 | elapsed time per iteration (s): 0.43 | learning rate: 1.236E-04 | global batch size: 256 | lm loss: 2.293723E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.220 | TFLOPs: 31.49 | 7: iteration 52700/ 115203 | consumed samples: 13491200 | consumed tokens: 27629977600 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.283899E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.110 | TFLOPs: 32.01 | 7: iteration 52710/ 115203 | consumed samples: 13493760 | consumed tokens: 27635220480 | elapsed time per iteration (s): 0.43 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.279601E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.355 | TFLOPs: 31.29 | 7: iteration 52720/ 115203 | consumed samples: 13496320 | consumed tokens: 27640463360 | elapsed time per iteration (s): 0.43 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.326359E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.274 | TFLOPs: 31.39 | 7: iteration 52730/ 115203 | consumed samples: 13498880 | consumed tokens: 27645706240 | elapsed time per iteration (s): 0.42 | learning rate: 1.235E-04 | global batch size: 256 | lm loss: 2.306900E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.847 | TFLOPs: 31.63 | 7: iteration 52740/ 115203 | consumed samples: 13501440 | consumed tokens: 27650949120 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.295835E+00 | grad norm: 0.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.686 | TFLOPs: 31.78 | 7: iteration 52750/ 115203 | consumed samples: 13504000 | consumed tokens: 27656192000 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.312897E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.822 | TFLOPs: 31.94 | 7: iteration 52760/ 115203 | consumed samples: 13506560 | consumed tokens: 27661434880 | elapsed time per iteration (s): 0.42 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.325784E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.809 | TFLOPs: 31.89 | 7: iteration 52770/ 115203 | consumed samples: 13509120 | consumed tokens: 27666677760 | elapsed time per iteration (s): 0.43 | learning rate: 1.234E-04 | global batch size: 256 | lm loss: 2.302615E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.291 | TFLOPs: 31.02 | 7: iteration 52780/ 115203 | consumed samples: 13511680 | consumed tokens: 27671920640 | elapsed time per iteration (s): 0.43 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.309392E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.164 | TFLOPs: 31.12 | 7: iteration 52790/ 115203 | consumed samples: 13514240 | consumed tokens: 27677163520 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.275352E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.416 | TFLOPs: 31.66 | 7: iteration 52800/ 115203 | consumed samples: 13516800 | consumed tokens: 27682406400 | elapsed time per iteration (s): 0.42 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.261416E+00 | grad norm: 0.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.397 | TFLOPs: 31.92 | 7: iteration 52810/ 115203 | consumed samples: 13519360 | consumed tokens: 27687649280 | elapsed time per iteration (s): 0.43 | learning rate: 1.233E-04 | global batch size: 256 | lm loss: 2.290096E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.451 | TFLOPs: 31.35 | 7: iteration 52820/ 115203 | consumed samples: 13521920 | consumed tokens: 27692892160 | elapsed time per iteration (s): 0.43 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.301889E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.035 | TFLOPs: 31.43 | 7: iteration 52830/ 115203 | consumed samples: 13524480 | consumed tokens: 27698135040 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.297680E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.421 | TFLOPs: 31.61 | 7: iteration 52840/ 115203 | consumed samples: 13527040 | consumed tokens: 27703377920 | elapsed time per iteration (s): 0.42 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.279612E+00 | grad norm: 0.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.088 | TFLOPs: 31.70 | 7: iteration 52850/ 115203 | consumed samples: 13529600 | consumed tokens: 27708620800 | elapsed time per iteration (s): 0.43 | learning rate: 1.232E-04 | global batch size: 256 | lm loss: 2.294457E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.255 | TFLOPs: 31.55 | 7: iteration 52860/ 115203 | consumed samples: 13532160 | consumed tokens: 27713863680 | elapsed time per iteration (s): 0.43 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.278356E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.768 | TFLOPs: 31.52 | 7: iteration 52870/ 115203 | consumed samples: 13534720 | consumed tokens: 27719106560 | elapsed time per iteration (s): 0.43 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.331948E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.173 | TFLOPs: 31.49 | 7: iteration 52880/ 115203 | consumed samples: 13537280 | consumed tokens: 27724349440 | elapsed time per iteration (s): 0.43 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.286263E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.282 | TFLOPs: 31.34 | 7: iteration 52890/ 115203 | consumed samples: 13539840 | consumed tokens: 27729592320 | elapsed time per iteration (s): 0.42 | learning rate: 1.231E-04 | global batch size: 256 | lm loss: 2.308809E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.735 | TFLOPs: 32.20 | 7: iteration 52900/ 115203 | consumed samples: 13542400 | consumed tokens: 27734835200 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.283942E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.569 | TFLOPs: 31.93 | 7: iteration 52910/ 115203 | consumed samples: 13544960 | consumed tokens: 27740078080 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.278113E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.571 | TFLOPs: 31.88 | 7: iteration 52920/ 115203 | consumed samples: 13547520 | consumed tokens: 27745320960 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.260087E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.244 | TFLOPs: 31.70 | 7: iteration 52930/ 115203 | consumed samples: 13550080 | consumed tokens: 27750563840 | elapsed time per iteration (s): 0.42 | learning rate: 1.230E-04 | global batch size: 256 | lm loss: 2.287249E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.871 | TFLOPs: 31.74 | 7: iteration 52940/ 115203 | consumed samples: 13552640 | consumed tokens: 27755806720 | elapsed time per iteration (s): 0.43 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.274083E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.660 | TFLOPs: 30.94 | 7: iteration 52950/ 115203 | consumed samples: 13555200 | consumed tokens: 27761049600 | elapsed time per iteration (s): 0.43 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.231808E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.849 | TFLOPs: 31.47 | 7: iteration 52960/ 115203 | consumed samples: 13557760 | consumed tokens: 27766292480 | elapsed time per iteration (s): 0.42 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.297303E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.282 | TFLOPs: 31.71 | 7: iteration 52970/ 115203 | consumed samples: 13560320 | consumed tokens: 27771535360 | elapsed time per iteration (s): 0.42 | learning rate: 1.229E-04 | global batch size: 256 | lm loss: 2.299591E+00 | grad norm: 0.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.953 | TFLOPs: 32.11 | 7: iteration 52980/ 115203 | consumed samples: 13562880 | consumed tokens: 27776778240 | elapsed time per iteration (s): 0.42 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.302535E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.164 | TFLOPs: 31.75 | 7: iteration 52990/ 115203 | consumed samples: 13565440 | consumed tokens: 27782021120 | elapsed time per iteration (s): 0.42 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.282463E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.400 | TFLOPs: 31.61 | 7: iteration 53000/ 115203 | consumed samples: 13568000 | consumed tokens: 27787264000 | elapsed time per iteration (s): 0.42 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.295112E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.015 | TFLOPs: 31.95 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 53000 | lm loss value: 2.318784E+00 | lm loss PPL: 1.016331E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 53000 to checkpoints_221m 0: [2022-11-28 19:19:28,962] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step53000 is begin to save! 0: [2022-11-28 19:19:28,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:19:29,078] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:19:29,078] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:19:29,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:19:29,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:19:29,125] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:19:29,126] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:19:29,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:19:29,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:19:29,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:19:29,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:19:29,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:19:29,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:19:29,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:19:29,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:19:29,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:19:29,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:19:29,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:19:29,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:19:29,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:19:29,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:19:29,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:19:29,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:19:29,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:19:29,343] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:19:29,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:19:29,368] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:19:29,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:19:29,392] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:19:29,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:19:29,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:19:29,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:19:29,440] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:19:29,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:19:29,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:19:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:19:29,488] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:19:29,512] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:19:29,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:19:29,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:19:29,517] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step53000/mp_rank_00_model_states.pt 0: [2022-11-28 19:19:29,517] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:19:29,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:19:29,537] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:19:29,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 5: [2022-11-28 19:19:29,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:19:29,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:19:29,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 2: [2022-11-28 19:19:29,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:19:29,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:19:29,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 4: [2022-11-28 19:19:29,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:19:29,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 7: [2022-11-28 19:19:29,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:19:29,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 1: [2022-11-28 19:19:29,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:19:29,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:19:29,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:19:29,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:19:29,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 19:19:29,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: [2022-11-28 19:19:29,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:19:29,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:19:29,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step53000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 6: [2022-11-28 19:19:29,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step53000 is ready now! 0: successfully saved checkpoint at iteration 53000 to checkpoints_221m 7: time (ms) | save-checkpoint: 768.40 7: iteration 53010/ 115203 | consumed samples: 13570560 | consumed tokens: 27792506880 | elapsed time per iteration (s): 0.51 | learning rate: 1.228E-04 | global batch size: 256 | lm loss: 2.294524E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 501.902 | TFLOPs: 26.33 | 7: iteration 53020/ 115203 | consumed samples: 13573120 | consumed tokens: 27797749760 | elapsed time per iteration (s): 0.42 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.306164E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.509 | TFLOPs: 32.29 | 7: iteration 53030/ 115203 | consumed samples: 13575680 | consumed tokens: 27802992640 | elapsed time per iteration (s): 0.42 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.297003E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.686 | TFLOPs: 31.67 | 7: iteration 53040/ 115203 | consumed samples: 13578240 | consumed tokens: 27808235520 | elapsed time per iteration (s): 0.53 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.327616E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 480.371 | TFLOPs: 25.20 | 7: iteration 53050/ 115203 | consumed samples: 13580800 | consumed tokens: 27813478400 | elapsed time per iteration (s): 0.43 | learning rate: 1.227E-04 | global batch size: 256 | lm loss: 2.287267E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.071 | TFLOPs: 31.01 | 7: iteration 53060/ 115203 | consumed samples: 13583360 | consumed tokens: 27818721280 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.303675E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.606 | TFLOPs: 31.78 | 7: iteration 53070/ 115203 | consumed samples: 13585920 | consumed tokens: 27823964160 | elapsed time per iteration (s): 0.43 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.310495E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.856 | TFLOPs: 31.53 | 7: iteration 53080/ 115203 | consumed samples: 13588480 | consumed tokens: 27829207040 | elapsed time per iteration (s): 0.42 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.283903E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.590 | TFLOPs: 31.88 | 7: iteration 53090/ 115203 | consumed samples: 13591040 | consumed tokens: 27834449920 | elapsed time per iteration (s): 0.43 | learning rate: 1.226E-04 | global batch size: 256 | lm loss: 2.318767E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.763 | TFLOPs: 30.89 | 7: iteration 53100/ 115203 | consumed samples: 13593600 | consumed tokens: 27839692800 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.291318E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.495 | TFLOPs: 31.87 | 7: iteration 53110/ 115203 | consumed samples: 13596160 | consumed tokens: 27844935680 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.302620E+00 | grad norm: 0.292 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.746 | TFLOPs: 31.73 | 7: iteration 53120/ 115203 | consumed samples: 13598720 | consumed tokens: 27850178560 | elapsed time per iteration (s): 0.42 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.287077E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.118 | TFLOPs: 31.70 | 7: iteration 53130/ 115203 | consumed samples: 13601280 | consumed tokens: 27855421440 | elapsed time per iteration (s): 0.43 | learning rate: 1.225E-04 | global batch size: 256 | lm loss: 2.278475E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.290 | TFLOPs: 31.18 | 7: iteration 53140/ 115203 | consumed samples: 13603840 | consumed tokens: 27860664320 | elapsed time per iteration (s): 0.42 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.327204E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.294 | TFLOPs: 31.86 | 7: iteration 53150/ 115203 | consumed samples: 13606400 | consumed tokens: 27865907200 | elapsed time per iteration (s): 0.43 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.301571E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.908 | TFLOPs: 31.37 | 7: iteration 53160/ 115203 | consumed samples: 13608960 | consumed tokens: 27871150080 | elapsed time per iteration (s): 0.43 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.289165E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.912 | TFLOPs: 31.48 | 7: iteration 53170/ 115203 | consumed samples: 13611520 | consumed tokens: 27876392960 | elapsed time per iteration (s): 0.44 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.292961E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.332 | TFLOPs: 30.66 | 7: iteration 53180/ 115203 | consumed samples: 13614080 | consumed tokens: 27881635840 | elapsed time per iteration (s): 0.43 | learning rate: 1.224E-04 | global batch size: 256 | lm loss: 2.305133E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.065 | TFLOPs: 31.43 | 7: iteration 53190/ 115203 | consumed samples: 13616640 | consumed tokens: 27886878720 | elapsed time per iteration (s): 0.42 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.300509E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.938 | TFLOPs: 31.90 | 7: iteration 53200/ 115203 | consumed samples: 13619200 | consumed tokens: 27892121600 | elapsed time per iteration (s): 0.42 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.331125E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.415 | TFLOPs: 31.61 | 7: iteration 53210/ 115203 | consumed samples: 13621760 | consumed tokens: 27897364480 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.291811E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.015 | TFLOPs: 31.48 | 7: iteration 53220/ 115203 | consumed samples: 13624320 | consumed tokens: 27902607360 | elapsed time per iteration (s): 0.43 | learning rate: 1.223E-04 | global batch size: 256 | lm loss: 2.266936E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.719 | TFLOPs: 31.52 | 7: iteration 53230/ 115203 | consumed samples: 13626880 | consumed tokens: 27907850240 | elapsed time per iteration (s): 0.42 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.313007E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.869 | TFLOPs: 32.16 | 7: iteration 53240/ 115203 | consumed samples: 13629440 | consumed tokens: 27913093120 | elapsed time per iteration (s): 0.42 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.286171E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.600 | TFLOPs: 31.83 | 7: iteration 53250/ 115203 | consumed samples: 13632000 | consumed tokens: 27918336000 | elapsed time per iteration (s): 0.42 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.291249E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.282 | TFLOPs: 31.81 | 7: iteration 53260/ 115203 | consumed samples: 13634560 | consumed tokens: 27923578880 | elapsed time per iteration (s): 0.42 | learning rate: 1.222E-04 | global batch size: 256 | lm loss: 2.274811E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.930 | TFLOPs: 31.84 | 7: iteration 53270/ 115203 | consumed samples: 13637120 | consumed tokens: 27928821760 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.281818E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.544 | TFLOPs: 31.72 | 7: iteration 53280/ 115203 | consumed samples: 13639680 | consumed tokens: 27934064640 | elapsed time per iteration (s): 0.43 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.314245E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.740 | TFLOPs: 31.00 | 7: iteration 53290/ 115203 | consumed samples: 13642240 | consumed tokens: 27939307520 | elapsed time per iteration (s): 0.42 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.300409E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.336 | TFLOPs: 31.87 | 7: iteration 53300/ 115203 | consumed samples: 13644800 | consumed tokens: 27944550400 | elapsed time per iteration (s): 0.43 | learning rate: 1.221E-04 | global batch size: 256 | lm loss: 2.289649E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.327 | TFLOPs: 31.60 | 7: iteration 53310/ 115203 | consumed samples: 13647360 | consumed tokens: 27949793280 | elapsed time per iteration (s): 0.43 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.297395E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.417 | TFLOPs: 31.50 | 7: iteration 53320/ 115203 | consumed samples: 13649920 | consumed tokens: 27955036160 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.283733E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.766 | TFLOPs: 31.63 | 7: iteration 53330/ 115203 | consumed samples: 13652480 | consumed tokens: 27960279040 | elapsed time per iteration (s): 0.43 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.325666E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.478 | TFLOPs: 31.30 | 7: iteration 53340/ 115203 | consumed samples: 13655040 | consumed tokens: 27965521920 | elapsed time per iteration (s): 0.42 | learning rate: 1.220E-04 | global batch size: 256 | lm loss: 2.332069E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.065 | TFLOPs: 31.75 | 7: iteration 53350/ 115203 | consumed samples: 13657600 | consumed tokens: 27970764800 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.298837E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.123 | TFLOPs: 31.64 | 7: iteration 53360/ 115203 | consumed samples: 13660160 | consumed tokens: 27976007680 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.283433E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.291 | TFLOPs: 31.71 | 7: iteration 53370/ 115203 | consumed samples: 13662720 | consumed tokens: 27981250560 | elapsed time per iteration (s): 0.43 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.307633E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.441 | TFLOPs: 31.24 | 7: iteration 53380/ 115203 | consumed samples: 13665280 | consumed tokens: 27986493440 | elapsed time per iteration (s): 0.42 | learning rate: 1.219E-04 | global batch size: 256 | lm loss: 2.312222E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.722 | TFLOPs: 31.89 | 7: iteration 53390/ 115203 | consumed samples: 13667840 | consumed tokens: 27991736320 | elapsed time per iteration (s): 0.43 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.271068E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.554 | TFLOPs: 30.99 | 7: iteration 53400/ 115203 | consumed samples: 13670400 | consumed tokens: 27996979200 | elapsed time per iteration (s): 0.43 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.293233E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.378 | TFLOPs: 31.29 | 7: iteration 53410/ 115203 | consumed samples: 13672960 | consumed tokens: 28002222080 | elapsed time per iteration (s): 0.42 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.311311E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.090 | TFLOPs: 32.06 | 7: iteration 53420/ 115203 | consumed samples: 13675520 | consumed tokens: 28007464960 | elapsed time per iteration (s): 0.42 | learning rate: 1.218E-04 | global batch size: 256 | lm loss: 2.310332E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.262 | TFLOPs: 31.86 | 7: iteration 53430/ 115203 | consumed samples: 13678080 | consumed tokens: 28012707840 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.277686E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.177 | TFLOPs: 31.65 | 7: iteration 53440/ 115203 | consumed samples: 13680640 | consumed tokens: 28017950720 | elapsed time per iteration (s): 0.43 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.305041E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.044 | TFLOPs: 31.54 | 7: iteration 53450/ 115203 | consumed samples: 13683200 | consumed tokens: 28023193600 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.301353E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.511 | TFLOPs: 31.77 | 7: iteration 53460/ 115203 | consumed samples: 13685760 | consumed tokens: 28028436480 | elapsed time per iteration (s): 0.42 | learning rate: 1.217E-04 | global batch size: 256 | lm loss: 2.274578E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.750 | TFLOPs: 32.15 | 7: iteration 53470/ 115203 | consumed samples: 13688320 | consumed tokens: 28033679360 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.301810E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.105 | TFLOPs: 31.80 | 7: iteration 53480/ 115203 | consumed samples: 13690880 | consumed tokens: 28038922240 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.295139E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.262 | TFLOPs: 31.65 | 7: iteration 53490/ 115203 | consumed samples: 13693440 | consumed tokens: 28044165120 | elapsed time per iteration (s): 0.43 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.309034E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.844 | TFLOPs: 30.95 | 7: iteration 53500/ 115203 | consumed samples: 13696000 | consumed tokens: 28049408000 | elapsed time per iteration (s): 0.42 | learning rate: 1.216E-04 | global batch size: 256 | lm loss: 2.290553E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.038 | TFLOPs: 31.69 | 7: iteration 53510/ 115203 | consumed samples: 13698560 | consumed tokens: 28054650880 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.295228E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.311 | TFLOPs: 31.97 | 7: iteration 53520/ 115203 | consumed samples: 13701120 | consumed tokens: 28059893760 | elapsed time per iteration (s): 0.43 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.303370E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.962 | TFLOPs: 31.58 | 7: iteration 53530/ 115203 | consumed samples: 13703680 | consumed tokens: 28065136640 | elapsed time per iteration (s): 0.42 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.306346E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.340 | TFLOPs: 31.87 | 7: iteration 53540/ 115203 | consumed samples: 13706240 | consumed tokens: 28070379520 | elapsed time per iteration (s): 0.43 | learning rate: 1.215E-04 | global batch size: 256 | lm loss: 2.324398E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.735 | TFLOPs: 31.20 | 7: iteration 53550/ 115203 | consumed samples: 13708800 | consumed tokens: 28075622400 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.316086E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.444 | TFLOPs: 31.87 | 7: iteration 53560/ 115203 | consumed samples: 13711360 | consumed tokens: 28080865280 | elapsed time per iteration (s): 0.43 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.281759E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.304 | TFLOPs: 31.44 | 7: iteration 53570/ 115203 | consumed samples: 13713920 | consumed tokens: 28086108160 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.266968E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.374 | TFLOPs: 31.97 | 7: iteration 53580/ 115203 | consumed samples: 13716480 | consumed tokens: 28091351040 | elapsed time per iteration (s): 0.42 | learning rate: 1.214E-04 | global batch size: 256 | lm loss: 2.295712E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.136 | TFLOPs: 31.86 | 7: iteration 53590/ 115203 | consumed samples: 13719040 | consumed tokens: 28096593920 | elapsed time per iteration (s): 0.42 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.266147E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.192 | TFLOPs: 31.65 | 7: iteration 53600/ 115203 | consumed samples: 13721600 | consumed tokens: 28101836800 | elapsed time per iteration (s): 0.43 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.304395E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.923 | TFLOPs: 31.27 | 7: iteration 53610/ 115203 | consumed samples: 13724160 | consumed tokens: 28107079680 | elapsed time per iteration (s): 0.43 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.298235E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.965 | TFLOPs: 31.43 | 7: iteration 53620/ 115203 | consumed samples: 13726720 | consumed tokens: 28112322560 | elapsed time per iteration (s): 0.79 | learning rate: 1.213E-04 | global batch size: 256 | lm loss: 2.289694E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 325.809 | TFLOPs: 17.09 | 7: iteration 53630/ 115203 | consumed samples: 13729280 | consumed tokens: 28117565440 | elapsed time per iteration (s): 0.42 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.278828E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.482 | TFLOPs: 32.03 | 7: iteration 53640/ 115203 | consumed samples: 13731840 | consumed tokens: 28122808320 | elapsed time per iteration (s): 0.97 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.289941E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 263.427 | TFLOPs: 13.82 | 7: iteration 53650/ 115203 | consumed samples: 13734400 | consumed tokens: 28128051200 | elapsed time per iteration (s): 0.61 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.315330E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 416.444 | TFLOPs: 21.85 | 7: iteration 53660/ 115203 | consumed samples: 13736960 | consumed tokens: 28133294080 | elapsed time per iteration (s): 0.44 | learning rate: 1.212E-04 | global batch size: 256 | lm loss: 2.305589E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.150 | TFLOPs: 30.39 | 7: iteration 53670/ 115203 | consumed samples: 13739520 | consumed tokens: 28138536960 | elapsed time per iteration (s): 0.43 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.319527E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.502 | TFLOPs: 31.04 | 7: iteration 53680/ 115203 | consumed samples: 13742080 | consumed tokens: 28143779840 | elapsed time per iteration (s): 0.44 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.286902E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.446 | TFLOPs: 30.77 | 7: iteration 53690/ 115203 | consumed samples: 13744640 | consumed tokens: 28149022720 | elapsed time per iteration (s): 0.44 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.316035E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.885 | TFLOPs: 30.53 | 7: iteration 53700/ 115203 | consumed samples: 13747200 | consumed tokens: 28154265600 | elapsed time per iteration (s): 0.43 | learning rate: 1.211E-04 | global batch size: 256 | lm loss: 2.328796E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.051 | TFLOPs: 31.01 | 7: iteration 53710/ 115203 | consumed samples: 13749760 | consumed tokens: 28159508480 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.334162E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.980 | TFLOPs: 31.48 | 7: iteration 53720/ 115203 | consumed samples: 13752320 | consumed tokens: 28164751360 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.288110E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.872 | TFLOPs: 31.11 | 7: iteration 53730/ 115203 | consumed samples: 13754880 | consumed tokens: 28169994240 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.303306E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.818 | TFLOPs: 31.21 | 7: iteration 53740/ 115203 | consumed samples: 13757440 | consumed tokens: 28175237120 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.302075E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.363 | TFLOPs: 31.40 | 7: iteration 53750/ 115203 | consumed samples: 13760000 | consumed tokens: 28180480000 | elapsed time per iteration (s): 0.43 | learning rate: 1.210E-04 | global batch size: 256 | lm loss: 2.295504E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.285 | TFLOPs: 31.13 | 7: iteration 53760/ 115203 | consumed samples: 13762560 | consumed tokens: 28185722880 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.280453E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.901 | TFLOPs: 31.06 | 7: iteration 53770/ 115203 | consumed samples: 13765120 | consumed tokens: 28190965760 | elapsed time per iteration (s): 0.45 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.260579E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.532 | TFLOPs: 29.99 | 7: iteration 53780/ 115203 | consumed samples: 13767680 | consumed tokens: 28196208640 | elapsed time per iteration (s): 0.44 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.292171E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.019 | TFLOPs: 30.85 | 7: iteration 53790/ 115203 | consumed samples: 13770240 | consumed tokens: 28201451520 | elapsed time per iteration (s): 0.43 | learning rate: 1.209E-04 | global batch size: 256 | lm loss: 2.273288E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.859 | TFLOPs: 31.32 | 7: iteration 53800/ 115203 | consumed samples: 13772800 | consumed tokens: 28206694400 | elapsed time per iteration (s): 0.44 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.292512E+00 | grad norm: 0.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.166 | TFLOPs: 30.76 | 7: iteration 53810/ 115203 | consumed samples: 13775360 | consumed tokens: 28211937280 | elapsed time per iteration (s): 0.45 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.280048E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.581 | TFLOPs: 29.88 | 7: iteration 53820/ 115203 | consumed samples: 13777920 | consumed tokens: 28217180160 | elapsed time per iteration (s): 0.43 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.302114E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.599 | TFLOPs: 30.88 | 7: iteration 53830/ 115203 | consumed samples: 13780480 | consumed tokens: 28222423040 | elapsed time per iteration (s): 0.45 | learning rate: 1.208E-04 | global batch size: 256 | lm loss: 2.289797E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.921 | TFLOPs: 29.96 | 7: iteration 53840/ 115203 | consumed samples: 13783040 | consumed tokens: 28227665920 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.282718E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.535 | TFLOPs: 31.35 | 7: iteration 53850/ 115203 | consumed samples: 13785600 | consumed tokens: 28232908800 | elapsed time per iteration (s): 0.45 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.257512E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.970 | TFLOPs: 29.96 | 7: iteration 53860/ 115203 | consumed samples: 13788160 | consumed tokens: 28238151680 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.293596E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.310 | TFLOPs: 31.34 | 7: iteration 53870/ 115203 | consumed samples: 13790720 | consumed tokens: 28243394560 | elapsed time per iteration (s): 0.43 | learning rate: 1.207E-04 | global batch size: 256 | lm loss: 2.253085E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.002 | TFLOPs: 31.22 | 7: iteration 53880/ 115203 | consumed samples: 13793280 | consumed tokens: 28248637440 | elapsed time per iteration (s): 0.43 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.291577E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.922 | TFLOPs: 31.32 | 7: iteration 53890/ 115203 | consumed samples: 13795840 | consumed tokens: 28253880320 | elapsed time per iteration (s): 0.43 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.299447E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.269 | TFLOPs: 31.13 | 7: iteration 53900/ 115203 | consumed samples: 13798400 | consumed tokens: 28259123200 | elapsed time per iteration (s): 0.43 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.295756E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.312 | TFLOPs: 31.24 | 7: iteration 53910/ 115203 | consumed samples: 13800960 | consumed tokens: 28264366080 | elapsed time per iteration (s): 0.44 | learning rate: 1.206E-04 | global batch size: 256 | lm loss: 2.298212E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.930 | TFLOPs: 30.38 | 7: iteration 53920/ 115203 | consumed samples: 13803520 | consumed tokens: 28269608960 | elapsed time per iteration (s): 0.44 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.282475E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.392 | TFLOPs: 30.35 | 7: iteration 53930/ 115203 | consumed samples: 13806080 | consumed tokens: 28274851840 | elapsed time per iteration (s): 0.44 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.277590E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.940 | TFLOPs: 30.74 | 7: iteration 53940/ 115203 | consumed samples: 13808640 | consumed tokens: 28280094720 | elapsed time per iteration (s): 0.43 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.281979E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.114 | TFLOPs: 31.33 | 7: iteration 53950/ 115203 | consumed samples: 13811200 | consumed tokens: 28285337600 | elapsed time per iteration (s): 0.44 | learning rate: 1.205E-04 | global batch size: 256 | lm loss: 2.252320E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.567 | TFLOPs: 30.72 | 7: iteration 53960/ 115203 | consumed samples: 13813760 | consumed tokens: 28290580480 | elapsed time per iteration (s): 0.44 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.315002E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.236 | TFLOPs: 30.60 | 7: iteration 53970/ 115203 | consumed samples: 13816320 | consumed tokens: 28295823360 | elapsed time per iteration (s): 0.43 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.311528E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.926 | TFLOPs: 31.00 | 7: iteration 53980/ 115203 | consumed samples: 13818880 | consumed tokens: 28301066240 | elapsed time per iteration (s): 0.44 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.300789E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.585 | TFLOPs: 30.41 | 7: iteration 53990/ 115203 | consumed samples: 13821440 | consumed tokens: 28306309120 | elapsed time per iteration (s): 0.45 | learning rate: 1.204E-04 | global batch size: 256 | lm loss: 2.297685E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.201 | TFLOPs: 29.60 | 0: [2022-11-28 19:26:50,503] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=0, lr=[0.00012033461390561511, 0.00012033461390561511, 0.00012033461390561511], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 54000/ 115203 | consumed samples: 13824000 | consumed tokens: 28311552000 | elapsed time per iteration (s): 0.44 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.295295E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.017 | TFLOPs: 30.85 | 0: steps: 54000 loss: 2.2815 iter time (s): 0.432 samples/sec: 592.232 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 54000 | lm loss value: 2.227744E+00 | lm loss PPL: 9.278910E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 54000 to checkpoints_221m 0: [2022-11-28 19:26:50,687] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step54000 is begin to save! 0: [2022-11-28 19:26:50,710] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:26:50,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:26:50,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:26:50,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:26:50,843] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:26:50,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:26:50,867] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:26:50,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:26:50,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:26:50,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:26:50,918] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:26:50,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:26:50,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:26:50,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:26:50,968] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:26:50,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:26:50,991] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:26:51,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:26:51,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:26:51,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:26:51,043] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:26:51,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:26:51,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:26:51,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:26:51,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:26:51,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:26:51,117] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:26:51,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:26:51,142] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:26:51,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:26:51,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:26:51,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:26:51,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:26:51,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:26:51,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:26:51,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:26:51,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:26:51,265] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:26:51,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:26:51,269] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:26:51,270] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step54000/mp_rank_00_model_states.pt 0: [2022-11-28 19:26:51,270] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:26:51,274] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:26:51,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:26:51,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:26:51,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:26:51,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 5: [2022-11-28 19:26:51,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:26:51,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:26:51,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:26:51,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 1: [2022-11-28 19:26:51,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:26:51,364] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:26:51,364] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 3: [2022-11-28 19:26:51,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:26:51,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:26:51,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 7: [2022-11-28 19:26:51,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 6: [2022-11-28 19:26:51,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: [2022-11-28 19:26:51,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:26:51,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 2: [2022-11-28 19:26:51,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:26:51,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step54000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:26:51,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step54000 is ready now! 0: successfully saved checkpoint at iteration 54000 to checkpoints_221m 7: time (ms) | save-checkpoint: 940.59 7: iteration 54010/ 115203 | consumed samples: 13826560 | consumed tokens: 28316794880 | elapsed time per iteration (s): 0.54 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.278561E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 469.865 | TFLOPs: 24.65 | 7: iteration 54020/ 115203 | consumed samples: 13829120 | consumed tokens: 28322037760 | elapsed time per iteration (s): 0.43 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.254183E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.333 | TFLOPs: 31.24 | 7: iteration 54030/ 115203 | consumed samples: 13831680 | consumed tokens: 28327280640 | elapsed time per iteration (s): 0.43 | learning rate: 1.203E-04 | global batch size: 256 | lm loss: 2.254763E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.478 | TFLOPs: 30.93 | 7: iteration 54040/ 115203 | consumed samples: 13834240 | consumed tokens: 28332523520 | elapsed time per iteration (s): 0.44 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.298869E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.647 | TFLOPs: 30.52 | 7: iteration 54050/ 115203 | consumed samples: 13836800 | consumed tokens: 28337766400 | elapsed time per iteration (s): 0.44 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.277549E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.999 | TFLOPs: 30.43 | 7: iteration 54060/ 115203 | consumed samples: 13839360 | consumed tokens: 28343009280 | elapsed time per iteration (s): 0.43 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.279834E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.380 | TFLOPs: 30.92 | 7: iteration 54070/ 115203 | consumed samples: 13841920 | consumed tokens: 28348252160 | elapsed time per iteration (s): 0.43 | learning rate: 1.202E-04 | global batch size: 256 | lm loss: 2.259437E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.725 | TFLOPs: 31.31 | 7: iteration 54080/ 115203 | consumed samples: 13844480 | consumed tokens: 28353495040 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.296740E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.292 | TFLOPs: 30.92 | 7: iteration 54090/ 115203 | consumed samples: 13847040 | consumed tokens: 28358737920 | elapsed time per iteration (s): 0.44 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.322385E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.362 | TFLOPs: 30.71 | 7: iteration 54100/ 115203 | consumed samples: 13849600 | consumed tokens: 28363980800 | elapsed time per iteration (s): 0.61 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.300062E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 423.134 | TFLOPs: 22.20 | 7: iteration 54110/ 115203 | consumed samples: 13852160 | consumed tokens: 28369223680 | elapsed time per iteration (s): 0.43 | learning rate: 1.201E-04 | global batch size: 256 | lm loss: 2.256129E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.204 | TFLOPs: 30.91 | 7: iteration 54120/ 115203 | consumed samples: 13854720 | consumed tokens: 28374466560 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.296662E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.976 | TFLOPs: 31.01 | 7: iteration 54130/ 115203 | consumed samples: 13857280 | consumed tokens: 28379709440 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.322165E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.312 | TFLOPs: 31.55 | 7: iteration 54140/ 115203 | consumed samples: 13859840 | consumed tokens: 28384952320 | elapsed time per iteration (s): 0.44 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.297796E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.884 | TFLOPs: 30.32 | 7: iteration 54150/ 115203 | consumed samples: 13862400 | consumed tokens: 28390195200 | elapsed time per iteration (s): 0.43 | learning rate: 1.200E-04 | global batch size: 256 | lm loss: 2.292095E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.899 | TFLOPs: 31.00 | 7: iteration 54160/ 115203 | consumed samples: 13864960 | consumed tokens: 28395438080 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.281596E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.126 | TFLOPs: 30.96 | 7: iteration 54170/ 115203 | consumed samples: 13867520 | consumed tokens: 28400680960 | elapsed time per iteration (s): 0.45 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.308326E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.080 | TFLOPs: 30.17 | 7: iteration 54180/ 115203 | consumed samples: 13870080 | consumed tokens: 28405923840 | elapsed time per iteration (s): 0.43 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.277998E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.845 | TFLOPs: 31.16 | 7: iteration 54190/ 115203 | consumed samples: 13872640 | consumed tokens: 28411166720 | elapsed time per iteration (s): 0.44 | learning rate: 1.199E-04 | global batch size: 256 | lm loss: 2.308348E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.177 | TFLOPs: 30.44 | 7: iteration 54200/ 115203 | consumed samples: 13875200 | consumed tokens: 28416409600 | elapsed time per iteration (s): 0.44 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.290367E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.496 | TFLOPs: 30.82 | 7: iteration 54210/ 115203 | consumed samples: 13877760 | consumed tokens: 28421652480 | elapsed time per iteration (s): 0.43 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.279387E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.427 | TFLOPs: 31.03 | 7: iteration 54220/ 115203 | consumed samples: 13880320 | consumed tokens: 28426895360 | elapsed time per iteration (s): 0.43 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.273924E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.073 | TFLOPs: 30.96 | 7: iteration 54230/ 115203 | consumed samples: 13882880 | consumed tokens: 28432138240 | elapsed time per iteration (s): 0.44 | learning rate: 1.198E-04 | global batch size: 256 | lm loss: 2.286607E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.828 | TFLOPs: 30.53 | 7: iteration 54240/ 115203 | consumed samples: 13885440 | consumed tokens: 28437381120 | elapsed time per iteration (s): 0.43 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.267486E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.916 | TFLOPs: 31.00 | 7: iteration 54250/ 115203 | consumed samples: 13888000 | consumed tokens: 28442624000 | elapsed time per iteration (s): 0.43 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.299776E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.688 | TFLOPs: 30.99 | 7: iteration 54260/ 115203 | consumed samples: 13890560 | consumed tokens: 28447866880 | elapsed time per iteration (s): 0.44 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.317261E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.685 | TFLOPs: 30.63 | 7: iteration 54270/ 115203 | consumed samples: 13893120 | consumed tokens: 28453109760 | elapsed time per iteration (s): 0.45 | learning rate: 1.197E-04 | global batch size: 256 | lm loss: 2.306728E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.013 | TFLOPs: 29.91 | 7: iteration 54280/ 115203 | consumed samples: 13895680 | consumed tokens: 28458352640 | elapsed time per iteration (s): 0.43 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.341146E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.426 | TFLOPs: 31.50 | 7: iteration 54290/ 115203 | consumed samples: 13898240 | consumed tokens: 28463595520 | elapsed time per iteration (s): 0.44 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.324557E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.196 | TFLOPs: 30.28 | 7: iteration 54300/ 115203 | consumed samples: 13900800 | consumed tokens: 28468838400 | elapsed time per iteration (s): 0.43 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.303785E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.777 | TFLOPs: 31.15 | 7: iteration 54310/ 115203 | consumed samples: 13903360 | consumed tokens: 28474081280 | elapsed time per iteration (s): 0.44 | learning rate: 1.196E-04 | global batch size: 256 | lm loss: 2.314701E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.443 | TFLOPs: 30.77 | 7: iteration 54320/ 115203 | consumed samples: 13905920 | consumed tokens: 28479324160 | elapsed time per iteration (s): 0.45 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.260076E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.387 | TFLOPs: 29.72 | 7: iteration 54330/ 115203 | consumed samples: 13908480 | consumed tokens: 28484567040 | elapsed time per iteration (s): 0.43 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.280521E+00 | grad norm: 0.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.344 | TFLOPs: 31.55 | 7: iteration 54340/ 115203 | consumed samples: 13911040 | consumed tokens: 28489809920 | elapsed time per iteration (s): 0.43 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.283159E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.741 | TFLOPs: 30.89 | 7: iteration 54350/ 115203 | consumed samples: 13913600 | consumed tokens: 28495052800 | elapsed time per iteration (s): 0.44 | learning rate: 1.195E-04 | global batch size: 256 | lm loss: 2.306908E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.285 | TFLOPs: 30.45 | 7: iteration 54360/ 115203 | consumed samples: 13916160 | consumed tokens: 28500295680 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.319748E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.747 | TFLOPs: 31.52 | 7: iteration 54370/ 115203 | consumed samples: 13918720 | consumed tokens: 28505538560 | elapsed time per iteration (s): 0.43 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.304395E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.015 | TFLOPs: 30.96 | 7: iteration 54380/ 115203 | consumed samples: 13921280 | consumed tokens: 28510781440 | elapsed time per iteration (s): 0.42 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.270995E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.761 | TFLOPs: 31.63 | 7: iteration 54390/ 115203 | consumed samples: 13923840 | consumed tokens: 28516024320 | elapsed time per iteration (s): 0.42 | learning rate: 1.194E-04 | global batch size: 256 | lm loss: 2.291309E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.405 | TFLOPs: 31.71 | 7: iteration 54400/ 115203 | consumed samples: 13926400 | consumed tokens: 28521267200 | elapsed time per iteration (s): 0.43 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.298066E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.133 | TFLOPs: 31.49 | 7: iteration 54410/ 115203 | consumed samples: 13928960 | consumed tokens: 28526510080 | elapsed time per iteration (s): 0.44 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.253632E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.208 | TFLOPs: 30.55 | 7: iteration 54420/ 115203 | consumed samples: 13931520 | consumed tokens: 28531752960 | elapsed time per iteration (s): 0.43 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.267863E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.037 | TFLOPs: 31.33 | 7: iteration 54430/ 115203 | consumed samples: 13934080 | consumed tokens: 28536995840 | elapsed time per iteration (s): 0.43 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.285740E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.112 | TFLOPs: 31.17 | 7: iteration 54440/ 115203 | consumed samples: 13936640 | consumed tokens: 28542238720 | elapsed time per iteration (s): 0.44 | learning rate: 1.193E-04 | global batch size: 256 | lm loss: 2.317302E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.380 | TFLOPs: 30.40 | 7: iteration 54450/ 115203 | consumed samples: 13939200 | consumed tokens: 28547481600 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.316101E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.355 | TFLOPs: 31.45 | 7: iteration 54460/ 115203 | consumed samples: 13941760 | consumed tokens: 28552724480 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.279996E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.344 | TFLOPs: 31.34 | 7: iteration 54470/ 115203 | consumed samples: 13944320 | consumed tokens: 28557967360 | elapsed time per iteration (s): 0.44 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.320018E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.743 | TFLOPs: 30.58 | 7: iteration 54480/ 115203 | consumed samples: 13946880 | consumed tokens: 28563210240 | elapsed time per iteration (s): 0.43 | learning rate: 1.192E-04 | global batch size: 256 | lm loss: 2.276424E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.659 | TFLOPs: 30.89 | 7: iteration 54490/ 115203 | consumed samples: 13949440 | consumed tokens: 28568453120 | elapsed time per iteration (s): 0.43 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.301530E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.972 | TFLOPs: 31.53 | 7: iteration 54500/ 115203 | consumed samples: 13952000 | consumed tokens: 28573696000 | elapsed time per iteration (s): 0.44 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.275130E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.625 | TFLOPs: 30.25 | 7: iteration 54510/ 115203 | consumed samples: 13954560 | consumed tokens: 28578938880 | elapsed time per iteration (s): 0.43 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.299590E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.977 | TFLOPs: 31.06 | 7: iteration 54520/ 115203 | consumed samples: 13957120 | consumed tokens: 28584181760 | elapsed time per iteration (s): 0.43 | learning rate: 1.191E-04 | global batch size: 256 | lm loss: 2.279340E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.527 | TFLOPs: 30.98 | 7: iteration 54530/ 115203 | consumed samples: 13959680 | consumed tokens: 28589424640 | elapsed time per iteration (s): 0.44 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.292009E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.164 | TFLOPs: 30.60 | 7: iteration 54540/ 115203 | consumed samples: 13962240 | consumed tokens: 28594667520 | elapsed time per iteration (s): 0.43 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.268364E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.712 | TFLOPs: 30.99 | 7: iteration 54550/ 115203 | consumed samples: 13964800 | consumed tokens: 28599910400 | elapsed time per iteration (s): 0.44 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.274726E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.679 | TFLOPs: 30.73 | 7: iteration 54560/ 115203 | consumed samples: 13967360 | consumed tokens: 28605153280 | elapsed time per iteration (s): 0.45 | learning rate: 1.190E-04 | global batch size: 256 | lm loss: 2.290666E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.270 | TFLOPs: 29.82 | 7: iteration 54570/ 115203 | consumed samples: 13969920 | consumed tokens: 28610396160 | elapsed time per iteration (s): 0.42 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.271782E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.945 | TFLOPs: 31.74 | 7: iteration 54580/ 115203 | consumed samples: 13972480 | consumed tokens: 28615639040 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.270003E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.288 | TFLOPs: 31.13 | 7: iteration 54590/ 115203 | consumed samples: 13975040 | consumed tokens: 28620881920 | elapsed time per iteration (s): 0.43 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.292336E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.648 | TFLOPs: 31.25 | 7: iteration 54600/ 115203 | consumed samples: 13977600 | consumed tokens: 28626124800 | elapsed time per iteration (s): 0.44 | learning rate: 1.189E-04 | global batch size: 256 | lm loss: 2.318390E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.105 | TFLOPs: 30.59 | 7: iteration 54610/ 115203 | consumed samples: 13980160 | consumed tokens: 28631367680 | elapsed time per iteration (s): 0.43 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.276562E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.428 | TFLOPs: 31.29 | 7: iteration 54620/ 115203 | consumed samples: 13982720 | consumed tokens: 28636610560 | elapsed time per iteration (s): 0.44 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.284519E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.180 | TFLOPs: 30.70 | 7: iteration 54630/ 115203 | consumed samples: 13985280 | consumed tokens: 28641853440 | elapsed time per iteration (s): 0.44 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.289213E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.029 | TFLOPs: 30.70 | 7: iteration 54640/ 115203 | consumed samples: 13987840 | consumed tokens: 28647096320 | elapsed time per iteration (s): 0.43 | learning rate: 1.188E-04 | global batch size: 256 | lm loss: 2.299365E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.890 | TFLOPs: 31.16 | 7: iteration 54650/ 115203 | consumed samples: 13990400 | consumed tokens: 28652339200 | elapsed time per iteration (s): 0.43 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.258136E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.938 | TFLOPs: 30.90 | 7: iteration 54660/ 115203 | consumed samples: 13992960 | consumed tokens: 28657582080 | elapsed time per iteration (s): 0.43 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.265447E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.003 | TFLOPs: 31.48 | 7: iteration 54670/ 115203 | consumed samples: 13995520 | consumed tokens: 28662824960 | elapsed time per iteration (s): 0.42 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.287197E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.368 | TFLOPs: 31.66 | 7: iteration 54680/ 115203 | consumed samples: 13998080 | consumed tokens: 28668067840 | elapsed time per iteration (s): 0.44 | learning rate: 1.187E-04 | global batch size: 256 | lm loss: 2.284037E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.310 | TFLOPs: 30.87 | 7: iteration 54690/ 115203 | consumed samples: 14000640 | consumed tokens: 28673310720 | elapsed time per iteration (s): 0.44 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.276596E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.761 | TFLOPs: 30.47 | 7: iteration 54700/ 115203 | consumed samples: 14003200 | consumed tokens: 28678553600 | elapsed time per iteration (s): 0.43 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.327258E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.048 | TFLOPs: 31.27 | 7: iteration 54710/ 115203 | consumed samples: 14005760 | consumed tokens: 28683796480 | elapsed time per iteration (s): 0.43 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.292092E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.287 | TFLOPs: 31.18 | 7: iteration 54720/ 115203 | consumed samples: 14008320 | consumed tokens: 28689039360 | elapsed time per iteration (s): 0.44 | learning rate: 1.186E-04 | global batch size: 256 | lm loss: 2.288205E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.685 | TFLOPs: 30.78 | 7: iteration 54730/ 115203 | consumed samples: 14010880 | consumed tokens: 28694282240 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.300373E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.895 | TFLOPs: 31.16 | 7: iteration 54740/ 115203 | consumed samples: 14013440 | consumed tokens: 28699525120 | elapsed time per iteration (s): 0.44 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.301500E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.875 | TFLOPs: 30.69 | 7: iteration 54750/ 115203 | consumed samples: 14016000 | consumed tokens: 28704768000 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.297218E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.421 | TFLOPs: 31.08 | 7: iteration 54760/ 115203 | consumed samples: 14018560 | consumed tokens: 28710010880 | elapsed time per iteration (s): 0.43 | learning rate: 1.185E-04 | global batch size: 256 | lm loss: 2.253043E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.884 | TFLOPs: 31.42 | 7: iteration 54770/ 115203 | consumed samples: 14021120 | consumed tokens: 28715253760 | elapsed time per iteration (s): 0.44 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.320572E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.942 | TFLOPs: 30.85 | 7: iteration 54780/ 115203 | consumed samples: 14023680 | consumed tokens: 28720496640 | elapsed time per iteration (s): 0.44 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.301211E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.163 | TFLOPs: 30.49 | 7: iteration 54790/ 115203 | consumed samples: 14026240 | consumed tokens: 28725739520 | elapsed time per iteration (s): 0.43 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.283518E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.522 | TFLOPs: 31.30 | 7: iteration 54800/ 115203 | consumed samples: 14028800 | consumed tokens: 28730982400 | elapsed time per iteration (s): 0.43 | learning rate: 1.184E-04 | global batch size: 256 | lm loss: 2.305067E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.795 | TFLOPs: 31.31 | 7: iteration 54810/ 115203 | consumed samples: 14031360 | consumed tokens: 28736225280 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.310939E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.280 | TFLOPs: 31.08 | 7: iteration 54820/ 115203 | consumed samples: 14033920 | consumed tokens: 28741468160 | elapsed time per iteration (s): 0.44 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.269044E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.585 | TFLOPs: 30.62 | 7: iteration 54830/ 115203 | consumed samples: 14036480 | consumed tokens: 28746711040 | elapsed time per iteration (s): 0.43 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.273780E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.447 | TFLOPs: 31.45 | 7: iteration 54840/ 115203 | consumed samples: 14039040 | consumed tokens: 28751953920 | elapsed time per iteration (s): 0.44 | learning rate: 1.183E-04 | global batch size: 256 | lm loss: 2.311747E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.772 | TFLOPs: 30.52 | 7: iteration 54850/ 115203 | consumed samples: 14041600 | consumed tokens: 28757196800 | elapsed time per iteration (s): 0.43 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.276229E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.099 | TFLOPs: 30.91 | 7: iteration 54860/ 115203 | consumed samples: 14044160 | consumed tokens: 28762439680 | elapsed time per iteration (s): 0.44 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.299221E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.795 | TFLOPs: 30.79 | 7: iteration 54870/ 115203 | consumed samples: 14046720 | consumed tokens: 28767682560 | elapsed time per iteration (s): 0.43 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.287703E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.975 | TFLOPs: 31.27 | 7: iteration 54880/ 115203 | consumed samples: 14049280 | consumed tokens: 28772925440 | elapsed time per iteration (s): 0.43 | learning rate: 1.182E-04 | global batch size: 256 | lm loss: 2.281226E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.188 | TFLOPs: 31.12 | 7: iteration 54890/ 115203 | consumed samples: 14051840 | consumed tokens: 28778168320 | elapsed time per iteration (s): 0.44 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.279012E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.544 | TFLOPs: 30.67 | 7: iteration 54900/ 115203 | consumed samples: 14054400 | consumed tokens: 28783411200 | elapsed time per iteration (s): 0.44 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.274845E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.578 | TFLOPs: 30.83 | 7: iteration 54910/ 115203 | consumed samples: 14056960 | consumed tokens: 28788654080 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.286232E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.132 | TFLOPs: 31.33 | 7: iteration 54920/ 115203 | consumed samples: 14059520 | consumed tokens: 28793896960 | elapsed time per iteration (s): 0.43 | learning rate: 1.181E-04 | global batch size: 256 | lm loss: 2.319550E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.057 | TFLOPs: 30.91 | 7: iteration 54930/ 115203 | consumed samples: 14062080 | consumed tokens: 28799139840 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.323052E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.511 | TFLOPs: 31.04 | 7: iteration 54940/ 115203 | consumed samples: 14064640 | consumed tokens: 28804382720 | elapsed time per iteration (s): 0.44 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.266082E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.417 | TFLOPs: 30.82 | 7: iteration 54950/ 115203 | consumed samples: 14067200 | consumed tokens: 28809625600 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.284754E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.820 | TFLOPs: 30.95 | 7: iteration 54960/ 115203 | consumed samples: 14069760 | consumed tokens: 28814868480 | elapsed time per iteration (s): 0.43 | learning rate: 1.180E-04 | global batch size: 256 | lm loss: 2.321358E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.754 | TFLOPs: 31.31 | 7: iteration 54970/ 115203 | consumed samples: 14072320 | consumed tokens: 28820111360 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.307872E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.082 | TFLOPs: 31.33 | 7: iteration 54980/ 115203 | consumed samples: 14074880 | consumed tokens: 28825354240 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.288872E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.107 | TFLOPs: 31.07 | 7: iteration 54990/ 115203 | consumed samples: 14077440 | consumed tokens: 28830597120 | elapsed time per iteration (s): 0.43 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.313911E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.105 | TFLOPs: 31.07 | 7: iteration 55000/ 115203 | consumed samples: 14080000 | consumed tokens: 28835840000 | elapsed time per iteration (s): 0.45 | learning rate: 1.179E-04 | global batch size: 256 | lm loss: 2.269126E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.755 | TFLOPs: 29.79 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 55000 | lm loss value: 2.187478E+00 | lm loss PPL: 8.912703E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 55000 to checkpoints_221m 0: [2022-11-28 19:34:07,751] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step55000 is begin to save! 0: [2022-11-28 19:34:07,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:34:08,218] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:34:08,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:34:08,249] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:34:08,249] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:34:08,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:34:08,281] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:34:08,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:34:08,313] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:34:08,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:34:08,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:34:08,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:34:08,379] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:34:08,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:34:08,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:34:08,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:34:08,444] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:34:08,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:34:08,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:34:08,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:34:08,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:34:08,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:34:08,542] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:34:08,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:34:08,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:34:08,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:34:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:34:08,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:34:08,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:34:08,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:34:08,675] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:34:08,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:34:08,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:34:08,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:34:08,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:34:08,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:34:08,772] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:34:08,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:34:08,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:34:08,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:34:08,809] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step55000/mp_rank_00_model_states.pt 0: [2022-11-28 19:34:08,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:34:08,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:34:08,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:34:08,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:34:08,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 7: [2022-11-28 19:34:08,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 2: [2022-11-28 19:34:08,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:34:08,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:34:08,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 6: [2022-11-28 19:34:08,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:34:08,905] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:34:08,905] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:34:08,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:34:08,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 5: [2022-11-28 19:34:08,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:34:08,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 19:34:08,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:34:08,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,909] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,918] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 4: [2022-11-28 19:34:08,918] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:34:08,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:34:08,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: [2022-11-28 19:34:08,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step55000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:34:08,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step55000 is ready now! 0: successfully saved checkpoint at iteration 55000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1232.53 7: iteration 55010/ 115203 | consumed samples: 14082560 | consumed tokens: 28841082880 | elapsed time per iteration (s): 0.57 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.311520E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 449.048 | TFLOPs: 23.56 | 7: iteration 55020/ 115203 | consumed samples: 14085120 | consumed tokens: 28846325760 | elapsed time per iteration (s): 0.45 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.245742E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.168 | TFLOPs: 30.18 | 7: iteration 55030/ 115203 | consumed samples: 14087680 | consumed tokens: 28851568640 | elapsed time per iteration (s): 0.43 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.296853E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.802 | TFLOPs: 30.95 | 7: iteration 55040/ 115203 | consumed samples: 14090240 | consumed tokens: 28856811520 | elapsed time per iteration (s): 0.45 | learning rate: 1.178E-04 | global batch size: 256 | lm loss: 2.280555E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.267 | TFLOPs: 29.92 | 7: iteration 55050/ 115203 | consumed samples: 14092800 | consumed tokens: 28862054400 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.288864E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.115 | TFLOPs: 31.12 | 7: iteration 55060/ 115203 | consumed samples: 14095360 | consumed tokens: 28867297280 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.291550E+00 | grad norm: 0.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.026 | TFLOPs: 31.01 | 7: iteration 55070/ 115203 | consumed samples: 14097920 | consumed tokens: 28872540160 | elapsed time per iteration (s): 0.43 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.308385E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.819 | TFLOPs: 30.89 | 7: iteration 55080/ 115203 | consumed samples: 14100480 | consumed tokens: 28877783040 | elapsed time per iteration (s): 0.47 | learning rate: 1.177E-04 | global batch size: 256 | lm loss: 2.280522E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.479 | TFLOPs: 28.31 | 7: iteration 55090/ 115203 | consumed samples: 14103040 | consumed tokens: 28883025920 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.323432E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.892 | TFLOPs: 31.42 | 7: iteration 55100/ 115203 | consumed samples: 14105600 | consumed tokens: 28888268800 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.291277E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.793 | TFLOPs: 31.31 | 7: iteration 55110/ 115203 | consumed samples: 14108160 | consumed tokens: 28893511680 | elapsed time per iteration (s): 0.44 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.337045E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.480 | TFLOPs: 30.67 | 7: iteration 55120/ 115203 | consumed samples: 14110720 | consumed tokens: 28898754560 | elapsed time per iteration (s): 0.43 | learning rate: 1.176E-04 | global batch size: 256 | lm loss: 2.280679E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.147 | TFLOPs: 31.12 | 7: iteration 55130/ 115203 | consumed samples: 14113280 | consumed tokens: 28903997440 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.273914E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.126 | TFLOPs: 31.33 | 7: iteration 55140/ 115203 | consumed samples: 14115840 | consumed tokens: 28909240320 | elapsed time per iteration (s): 0.45 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.269816E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.864 | TFLOPs: 29.79 | 7: iteration 55150/ 115203 | consumed samples: 14118400 | consumed tokens: 28914483200 | elapsed time per iteration (s): 0.44 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.286846E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.408 | TFLOPs: 30.35 | 7: iteration 55160/ 115203 | consumed samples: 14120960 | consumed tokens: 28919726080 | elapsed time per iteration (s): 0.43 | learning rate: 1.175E-04 | global batch size: 256 | lm loss: 2.290554E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.820 | TFLOPs: 31.47 | 7: iteration 55170/ 115203 | consumed samples: 14123520 | consumed tokens: 28924968960 | elapsed time per iteration (s): 0.43 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.295900E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.601 | TFLOPs: 31.36 | 7: iteration 55180/ 115203 | consumed samples: 14126080 | consumed tokens: 28930211840 | elapsed time per iteration (s): 0.43 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.304342E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.102 | TFLOPs: 30.96 | 7: iteration 55190/ 115203 | consumed samples: 14128640 | consumed tokens: 28935454720 | elapsed time per iteration (s): 0.44 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.289147E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.997 | TFLOPs: 30.64 | 7: iteration 55200/ 115203 | consumed samples: 14131200 | consumed tokens: 28940697600 | elapsed time per iteration (s): 0.44 | learning rate: 1.174E-04 | global batch size: 256 | lm loss: 2.324050E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.491 | TFLOPs: 30.46 | 7: iteration 55210/ 115203 | consumed samples: 14133760 | consumed tokens: 28945940480 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.312763E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.575 | TFLOPs: 30.93 | 7: iteration 55220/ 115203 | consumed samples: 14136320 | consumed tokens: 28951183360 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.319637E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.473 | TFLOPs: 31.14 | 7: iteration 55230/ 115203 | consumed samples: 14138880 | consumed tokens: 28956426240 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.327615E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.233 | TFLOPs: 30.92 | 7: iteration 55240/ 115203 | consumed samples: 14141440 | consumed tokens: 28961669120 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.274556E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.983 | TFLOPs: 31.32 | 7: iteration 55250/ 115203 | consumed samples: 14144000 | consumed tokens: 28966912000 | elapsed time per iteration (s): 0.43 | learning rate: 1.173E-04 | global batch size: 256 | lm loss: 2.264443E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.419 | TFLOPs: 31.50 | 7: iteration 55260/ 115203 | consumed samples: 14146560 | consumed tokens: 28972154880 | elapsed time per iteration (s): 0.44 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.288994E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.678 | TFLOPs: 30.83 | 7: iteration 55270/ 115203 | consumed samples: 14149120 | consumed tokens: 28977397760 | elapsed time per iteration (s): 0.44 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.282831E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.138 | TFLOPs: 30.81 | 7: iteration 55280/ 115203 | consumed samples: 14151680 | consumed tokens: 28982640640 | elapsed time per iteration (s): 0.43 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.292313E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.565 | TFLOPs: 31.14 | 7: iteration 55290/ 115203 | consumed samples: 14154240 | consumed tokens: 28987883520 | elapsed time per iteration (s): 0.43 | learning rate: 1.172E-04 | global batch size: 256 | lm loss: 2.304537E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.730 | TFLOPs: 30.94 | 7: iteration 55300/ 115203 | consumed samples: 14156800 | consumed tokens: 28993126400 | elapsed time per iteration (s): 0.43 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.310021E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.441 | TFLOPs: 31.24 | 7: iteration 55310/ 115203 | consumed samples: 14159360 | consumed tokens: 28998369280 | elapsed time per iteration (s): 0.44 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.263533E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.829 | TFLOPs: 30.21 | 7: iteration 55320/ 115203 | consumed samples: 14161920 | consumed tokens: 29003612160 | elapsed time per iteration (s): 0.44 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.323086E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.512 | TFLOPs: 30.72 | 7: iteration 55330/ 115203 | consumed samples: 14164480 | consumed tokens: 29008855040 | elapsed time per iteration (s): 0.43 | learning rate: 1.171E-04 | global batch size: 256 | lm loss: 2.262445E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.037 | TFLOPs: 31.12 | 7: iteration 55340/ 115203 | consumed samples: 14167040 | consumed tokens: 29014097920 | elapsed time per iteration (s): 0.43 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.311827E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.730 | TFLOPs: 31.41 | 7: iteration 55350/ 115203 | consumed samples: 14169600 | consumed tokens: 29019340800 | elapsed time per iteration (s): 0.44 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.264039E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.718 | TFLOPs: 30.36 | 7: iteration 55360/ 115203 | consumed samples: 14172160 | consumed tokens: 29024583680 | elapsed time per iteration (s): 0.42 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.289887E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.625 | TFLOPs: 31.62 | 7: iteration 55370/ 115203 | consumed samples: 14174720 | consumed tokens: 29029826560 | elapsed time per iteration (s): 0.43 | learning rate: 1.170E-04 | global batch size: 256 | lm loss: 2.277953E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.087 | TFLOPs: 30.96 | 7: iteration 55380/ 115203 | consumed samples: 14177280 | consumed tokens: 29035069440 | elapsed time per iteration (s): 0.43 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.268717E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.206 | TFLOPs: 31.23 | 7: iteration 55390/ 115203 | consumed samples: 14179840 | consumed tokens: 29040312320 | elapsed time per iteration (s): 0.43 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.301065E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.808 | TFLOPs: 31.31 | 7: iteration 55400/ 115203 | consumed samples: 14182400 | consumed tokens: 29045555200 | elapsed time per iteration (s): 0.45 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.289031E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.430 | TFLOPs: 30.09 | 7: iteration 55410/ 115203 | consumed samples: 14184960 | consumed tokens: 29050798080 | elapsed time per iteration (s): 0.44 | learning rate: 1.169E-04 | global batch size: 256 | lm loss: 2.274250E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.807 | TFLOPs: 30.84 | 7: iteration 55420/ 115203 | consumed samples: 14187520 | consumed tokens: 29056040960 | elapsed time per iteration (s): 0.43 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.317034E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.776 | TFLOPs: 31.10 | 7: iteration 55430/ 115203 | consumed samples: 14190080 | consumed tokens: 29061283840 | elapsed time per iteration (s): 0.42 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.281120E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.397 | TFLOPs: 31.82 | 7: iteration 55440/ 115203 | consumed samples: 14192640 | consumed tokens: 29066526720 | elapsed time per iteration (s): 0.43 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.307971E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.844 | TFLOPs: 31.26 | 7: iteration 55450/ 115203 | consumed samples: 14195200 | consumed tokens: 29071769600 | elapsed time per iteration (s): 0.44 | learning rate: 1.168E-04 | global batch size: 256 | lm loss: 2.305206E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.154 | TFLOPs: 30.81 | 7: iteration 55460/ 115203 | consumed samples: 14197760 | consumed tokens: 29077012480 | elapsed time per iteration (s): 0.43 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.310732E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.800 | TFLOPs: 31.37 | 7: iteration 55470/ 115203 | consumed samples: 14200320 | consumed tokens: 29082255360 | elapsed time per iteration (s): 0.43 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.286188E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.144 | TFLOPs: 31.07 | 7: iteration 55480/ 115203 | consumed samples: 14202880 | consumed tokens: 29087498240 | elapsed time per iteration (s): 0.44 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.301457E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.842 | TFLOPs: 30.84 | 7: iteration 55490/ 115203 | consumed samples: 14205440 | consumed tokens: 29092741120 | elapsed time per iteration (s): 0.46 | learning rate: 1.167E-04 | global batch size: 256 | lm loss: 2.284686E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.021 | TFLOPs: 29.49 | 7: iteration 55500/ 115203 | consumed samples: 14208000 | consumed tokens: 29097984000 | elapsed time per iteration (s): 0.43 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.245319E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.564 | TFLOPs: 31.09 | 7: iteration 55510/ 115203 | consumed samples: 14210560 | consumed tokens: 29103226880 | elapsed time per iteration (s): 0.43 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.287025E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.962 | TFLOPs: 31.43 | 7: iteration 55520/ 115203 | consumed samples: 14213120 | consumed tokens: 29108469760 | elapsed time per iteration (s): 0.44 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.310229E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.036 | TFLOPs: 30.64 | 7: iteration 55530/ 115203 | consumed samples: 14215680 | consumed tokens: 29113712640 | elapsed time per iteration (s): 0.44 | learning rate: 1.166E-04 | global batch size: 256 | lm loss: 2.277361E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.658 | TFLOPs: 30.31 | 7: iteration 55540/ 115203 | consumed samples: 14218240 | consumed tokens: 29118955520 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.288360E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.514 | TFLOPs: 30.88 | 7: iteration 55550/ 115203 | consumed samples: 14220800 | consumed tokens: 29124198400 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.269127E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.071 | TFLOPs: 31.22 | 7: iteration 55560/ 115203 | consumed samples: 14223360 | consumed tokens: 29129441280 | elapsed time per iteration (s): 0.43 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.300180E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.411 | TFLOPs: 31.24 | 7: iteration 55570/ 115203 | consumed samples: 14225920 | consumed tokens: 29134684160 | elapsed time per iteration (s): 0.44 | learning rate: 1.165E-04 | global batch size: 256 | lm loss: 2.297238E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.578 | TFLOPs: 30.62 | 7: iteration 55580/ 115203 | consumed samples: 14228480 | consumed tokens: 29139927040 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.264854E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.891 | TFLOPs: 30.95 | 7: iteration 55590/ 115203 | consumed samples: 14231040 | consumed tokens: 29145169920 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.304116E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.131 | TFLOPs: 31.33 | 7: iteration 55600/ 115203 | consumed samples: 14233600 | consumed tokens: 29150412800 | elapsed time per iteration (s): 0.45 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.294158E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.764 | TFLOPs: 30.00 | 7: iteration 55610/ 115203 | consumed samples: 14236160 | consumed tokens: 29155655680 | elapsed time per iteration (s): 0.43 | learning rate: 1.164E-04 | global batch size: 256 | lm loss: 2.318408E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.122 | TFLOPs: 31.54 | 7: iteration 55620/ 115203 | consumed samples: 14238720 | consumed tokens: 29160898560 | elapsed time per iteration (s): 0.43 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.298212E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.978 | TFLOPs: 31.22 | 7: iteration 55630/ 115203 | consumed samples: 14241280 | consumed tokens: 29166141440 | elapsed time per iteration (s): 0.43 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.287881E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.643 | TFLOPs: 30.99 | 7: iteration 55640/ 115203 | consumed samples: 14243840 | consumed tokens: 29171384320 | elapsed time per iteration (s): 0.43 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.260582E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.011 | TFLOPs: 31.38 | 7: iteration 55650/ 115203 | consumed samples: 14246400 | consumed tokens: 29176627200 | elapsed time per iteration (s): 0.44 | learning rate: 1.163E-04 | global batch size: 256 | lm loss: 2.324429E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.612 | TFLOPs: 30.57 | 7: iteration 55660/ 115203 | consumed samples: 14248960 | consumed tokens: 29181870080 | elapsed time per iteration (s): 0.44 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.287733E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.650 | TFLOPs: 30.73 | 7: iteration 55670/ 115203 | consumed samples: 14251520 | consumed tokens: 29187112960 | elapsed time per iteration (s): 0.43 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.294019E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.303 | TFLOPs: 31.60 | 7: iteration 55680/ 115203 | consumed samples: 14254080 | consumed tokens: 29192355840 | elapsed time per iteration (s): 0.44 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.277337E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.493 | TFLOPs: 30.35 | 7: iteration 55690/ 115203 | consumed samples: 14256640 | consumed tokens: 29197598720 | elapsed time per iteration (s): 0.43 | learning rate: 1.162E-04 | global batch size: 256 | lm loss: 2.280909E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.427 | TFLOPs: 31.14 | 7: iteration 55700/ 115203 | consumed samples: 14259200 | consumed tokens: 29202841600 | elapsed time per iteration (s): 0.44 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.281445E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.166 | TFLOPs: 30.76 | 7: iteration 55710/ 115203 | consumed samples: 14261760 | consumed tokens: 29208084480 | elapsed time per iteration (s): 0.44 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.268537E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.914 | TFLOPs: 30.58 | 7: iteration 55720/ 115203 | consumed samples: 14264320 | consumed tokens: 29213327360 | elapsed time per iteration (s): 0.46 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.274246E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.908 | TFLOPs: 29.06 | 7: iteration 55730/ 115203 | consumed samples: 14266880 | consumed tokens: 29218570240 | elapsed time per iteration (s): 0.43 | learning rate: 1.161E-04 | global batch size: 256 | lm loss: 2.258292E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.747 | TFLOPs: 31.15 | 7: iteration 55740/ 115203 | consumed samples: 14269440 | consumed tokens: 29223813120 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.310107E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.233 | TFLOPs: 31.18 | 7: iteration 55750/ 115203 | consumed samples: 14272000 | consumed tokens: 29229056000 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.318697E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.646 | TFLOPs: 31.51 | 7: iteration 55760/ 115203 | consumed samples: 14274560 | consumed tokens: 29234298880 | elapsed time per iteration (s): 0.45 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.280278E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.555 | TFLOPs: 30.09 | 7: iteration 55770/ 115203 | consumed samples: 14277120 | consumed tokens: 29239541760 | elapsed time per iteration (s): 0.43 | learning rate: 1.160E-04 | global batch size: 256 | lm loss: 2.283486E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.449 | TFLOPs: 31.08 | 7: iteration 55780/ 115203 | consumed samples: 14279680 | consumed tokens: 29244784640 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.326089E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.527 | TFLOPs: 31.25 | 7: iteration 55790/ 115203 | consumed samples: 14282240 | consumed tokens: 29250027520 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.298558E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.029 | TFLOPs: 31.01 | 7: iteration 55800/ 115203 | consumed samples: 14284800 | consumed tokens: 29255270400 | elapsed time per iteration (s): 0.44 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.302365E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.180 | TFLOPs: 30.44 | 7: iteration 55810/ 115203 | consumed samples: 14287360 | consumed tokens: 29260513280 | elapsed time per iteration (s): 0.43 | learning rate: 1.159E-04 | global batch size: 256 | lm loss: 2.288751E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.751 | TFLOPs: 31.26 | 7: iteration 55820/ 115203 | consumed samples: 14289920 | consumed tokens: 29265756160 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.309271E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.822 | TFLOPs: 31.31 | 7: iteration 55830/ 115203 | consumed samples: 14292480 | consumed tokens: 29270999040 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.303001E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.039 | TFLOPs: 31.12 | 7: iteration 55840/ 115203 | consumed samples: 14295040 | consumed tokens: 29276241920 | elapsed time per iteration (s): 0.44 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.290310E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.597 | TFLOPs: 30.36 | 7: iteration 55850/ 115203 | consumed samples: 14297600 | consumed tokens: 29281484800 | elapsed time per iteration (s): 0.43 | learning rate: 1.158E-04 | global batch size: 256 | lm loss: 2.318443E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.377 | TFLOPs: 31.08 | 7: iteration 55860/ 115203 | consumed samples: 14300160 | consumed tokens: 29286727680 | elapsed time per iteration (s): 0.44 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.278054E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.928 | TFLOPs: 30.74 | 7: iteration 55870/ 115203 | consumed samples: 14302720 | consumed tokens: 29291970560 | elapsed time per iteration (s): 0.43 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.275869E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.997 | TFLOPs: 31.53 | 7: iteration 55880/ 115203 | consumed samples: 14305280 | consumed tokens: 29297213440 | elapsed time per iteration (s): 0.43 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.293878E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.466 | TFLOPs: 31.14 | 7: iteration 55890/ 115203 | consumed samples: 14307840 | consumed tokens: 29302456320 | elapsed time per iteration (s): 0.43 | learning rate: 1.157E-04 | global batch size: 256 | lm loss: 2.292036E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.497 | TFLOPs: 31.03 | 7: iteration 55900/ 115203 | consumed samples: 14310400 | consumed tokens: 29307699200 | elapsed time per iteration (s): 0.44 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.326465E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.546 | TFLOPs: 30.20 | 7: iteration 55910/ 115203 | consumed samples: 14312960 | consumed tokens: 29312942080 | elapsed time per iteration (s): 0.44 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.260415E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.656 | TFLOPs: 30.41 | 7: iteration 55920/ 115203 | consumed samples: 14315520 | consumed tokens: 29318184960 | elapsed time per iteration (s): 0.43 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.270962E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.191 | TFLOPs: 30.97 | 7: iteration 55930/ 115203 | consumed samples: 14318080 | consumed tokens: 29323427840 | elapsed time per iteration (s): 0.42 | learning rate: 1.156E-04 | global batch size: 256 | lm loss: 2.294684E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.248 | TFLOPs: 31.65 | 7: iteration 55940/ 115203 | consumed samples: 14320640 | consumed tokens: 29328670720 | elapsed time per iteration (s): 0.43 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.268049E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.937 | TFLOPs: 31.06 | 7: iteration 55950/ 115203 | consumed samples: 14323200 | consumed tokens: 29333913600 | elapsed time per iteration (s): 0.44 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.264681E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.974 | TFLOPs: 30.54 | 7: iteration 55960/ 115203 | consumed samples: 14325760 | consumed tokens: 29339156480 | elapsed time per iteration (s): 0.43 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.293964E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.675 | TFLOPs: 30.89 | 7: iteration 55970/ 115203 | consumed samples: 14328320 | consumed tokens: 29344399360 | elapsed time per iteration (s): 0.44 | learning rate: 1.155E-04 | global batch size: 256 | lm loss: 2.294036E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.529 | TFLOPs: 30.72 | 7: iteration 55980/ 115203 | consumed samples: 14330880 | consumed tokens: 29349642240 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.313665E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.409 | TFLOPs: 31.24 | 7: iteration 55990/ 115203 | consumed samples: 14333440 | consumed tokens: 29354885120 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.284921E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.238 | TFLOPs: 30.92 | 0: [2022-11-28 19:41:23,821] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=0, lr=[0.00011539606744822729, 0.00011539606744822729, 0.00011539606744822729], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 56000/ 115203 | consumed samples: 14336000 | consumed tokens: 29360128000 | elapsed time per iteration (s): 0.43 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.268624E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.509 | TFLOPs: 31.25 | 0: steps: 56000 loss: 2.2460 iter time (s): 0.433 samples/sec: 590.614 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 56000 | lm loss value: 2.217434E+00 | lm loss PPL: 9.183734E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 56000 to checkpoints_221m 0: [2022-11-28 19:41:24,056] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step56000 is begin to save! 0: [2022-11-28 19:41:24,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:41:24,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:41:24,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:41:24,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:41:24,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:41:24,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:41:24,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:41:24,244] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:41:24,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:41:24,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:41:24,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:41:24,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:41:24,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:41:24,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:41:24,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:41:24,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:41:24,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:41:24,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:41:24,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:41:24,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:41:24,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:41:24,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:41:24,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:41:24,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:41:24,436] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:41:24,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:41:24,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:41:24,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:41:24,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:41:24,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:41:24,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:41:24,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:41:24,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:41:24,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:41:24,556] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:41:24,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:41:24,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:41:24,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:41:24,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:41:24,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:41:24,609] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step56000/mp_rank_00_model_states.pt 0: [2022-11-28 19:41:24,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:41:24,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:41:24,630] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:41:24,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:41:24,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:41:24,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,699] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 4: [2022-11-28 19:41:24,699] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,701] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,701] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,704] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,704] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,708] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,708] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,709] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,709] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,711] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,711] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:41:24,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,712] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 2: [2022-11-28 19:41:24,712] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 7: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:41:24,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,724] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:41:24,725] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:41:24,725] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,715] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,718] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,716] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,718] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 5: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:41:24,717] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 19:41:24,717] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,720] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,720] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 3: [2022-11-28 19:41:24,727] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:41:24,727] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 19:41:24,727] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: [2022-11-28 19:41:24,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:41:24,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 6: [2022-11-28 19:41:24,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:41:24,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step56000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:41:24,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step56000 is ready now! 0: successfully saved checkpoint at iteration 56000 to checkpoints_221m 7: time (ms) | save-checkpoint: 803.73 7: iteration 56010/ 115203 | consumed samples: 14338560 | consumed tokens: 29365370880 | elapsed time per iteration (s): 0.53 | learning rate: 1.154E-04 | global batch size: 256 | lm loss: 2.318626E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 482.091 | TFLOPs: 25.29 | 7: iteration 56020/ 115203 | consumed samples: 14341120 | consumed tokens: 29370613760 | elapsed time per iteration (s): 0.43 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.268322E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.386 | TFLOPs: 31.29 | 7: iteration 56030/ 115203 | consumed samples: 14343680 | consumed tokens: 29375856640 | elapsed time per iteration (s): 0.43 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.334739E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.369 | TFLOPs: 31.03 | 7: iteration 56040/ 115203 | consumed samples: 14346240 | consumed tokens: 29381099520 | elapsed time per iteration (s): 0.45 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.324014E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.230 | TFLOPs: 30.08 | 7: iteration 56050/ 115203 | consumed samples: 14348800 | consumed tokens: 29386342400 | elapsed time per iteration (s): 0.49 | learning rate: 1.153E-04 | global batch size: 256 | lm loss: 2.244485E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 519.382 | TFLOPs: 27.25 | 7: iteration 56060/ 115203 | consumed samples: 14351360 | consumed tokens: 29391585280 | elapsed time per iteration (s): 0.44 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.251260E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.394 | TFLOPs: 30.82 | 7: iteration 56070/ 115203 | consumed samples: 14353920 | consumed tokens: 29396828160 | elapsed time per iteration (s): 0.54 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.288341E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 478.479 | TFLOPs: 25.11 | 7: iteration 56080/ 115203 | consumed samples: 14356480 | consumed tokens: 29402071040 | elapsed time per iteration (s): 0.43 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.306986E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.355 | TFLOPs: 31.08 | 7: iteration 56090/ 115203 | consumed samples: 14359040 | consumed tokens: 29407313920 | elapsed time per iteration (s): 0.44 | learning rate: 1.152E-04 | global batch size: 256 | lm loss: 2.276962E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.143 | TFLOPs: 30.33 | 7: iteration 56100/ 115203 | consumed samples: 14361600 | consumed tokens: 29412556800 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.310714E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.045 | TFLOPs: 31.17 | 7: iteration 56110/ 115203 | consumed samples: 14364160 | consumed tokens: 29417799680 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.295246E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.002 | TFLOPs: 31.17 | 7: iteration 56120/ 115203 | consumed samples: 14366720 | consumed tokens: 29423042560 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.291935E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.948 | TFLOPs: 30.95 | 7: iteration 56130/ 115203 | consumed samples: 14369280 | consumed tokens: 29428285440 | elapsed time per iteration (s): 0.43 | learning rate: 1.151E-04 | global batch size: 256 | lm loss: 2.291624E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.094 | TFLOPs: 31.43 | 7: iteration 56140/ 115203 | consumed samples: 14371840 | consumed tokens: 29433528320 | elapsed time per iteration (s): 0.45 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.301058E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.175 | TFLOPs: 29.65 | 7: iteration 56150/ 115203 | consumed samples: 14374400 | consumed tokens: 29438771200 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.301047E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.472 | TFLOPs: 31.03 | 7: iteration 56160/ 115203 | consumed samples: 14376960 | consumed tokens: 29444014080 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.303206E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.292 | TFLOPs: 31.02 | 7: iteration 56170/ 115203 | consumed samples: 14379520 | consumed tokens: 29449256960 | elapsed time per iteration (s): 0.43 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.305503E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.505 | TFLOPs: 30.93 | 7: iteration 56180/ 115203 | consumed samples: 14382080 | consumed tokens: 29454499840 | elapsed time per iteration (s): 0.44 | learning rate: 1.150E-04 | global batch size: 256 | lm loss: 2.272603E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.335 | TFLOPs: 30.61 | 7: iteration 56190/ 115203 | consumed samples: 14384640 | consumed tokens: 29459742720 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.252393E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.337 | TFLOPs: 31.03 | 7: iteration 56200/ 115203 | consumed samples: 14387200 | consumed tokens: 29464985600 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.277757E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.971 | TFLOPs: 30.90 | 7: iteration 56210/ 115203 | consumed samples: 14389760 | consumed tokens: 29470228480 | elapsed time per iteration (s): 0.44 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.271766E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.173 | TFLOPs: 30.81 | 7: iteration 56220/ 115203 | consumed samples: 14392320 | consumed tokens: 29475471360 | elapsed time per iteration (s): 0.43 | learning rate: 1.149E-04 | global batch size: 256 | lm loss: 2.275432E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.559 | TFLOPs: 31.35 | 7: iteration 56230/ 115203 | consumed samples: 14394880 | consumed tokens: 29480714240 | elapsed time per iteration (s): 0.44 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.315492E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.470 | TFLOPs: 30.72 | 7: iteration 56240/ 115203 | consumed samples: 14397440 | consumed tokens: 29485957120 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.276688E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.887 | TFLOPs: 31.27 | 7: iteration 56250/ 115203 | consumed samples: 14400000 | consumed tokens: 29491200000 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.290439E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.914 | TFLOPs: 31.42 | 7: iteration 56260/ 115203 | consumed samples: 14402560 | consumed tokens: 29496442880 | elapsed time per iteration (s): 0.43 | learning rate: 1.148E-04 | global batch size: 256 | lm loss: 2.285609E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.774 | TFLOPs: 31.15 | 7: iteration 56270/ 115203 | consumed samples: 14405120 | consumed tokens: 29501685760 | elapsed time per iteration (s): 0.45 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.271062E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.489 | TFLOPs: 29.88 | 7: iteration 56280/ 115203 | consumed samples: 14407680 | consumed tokens: 29506928640 | elapsed time per iteration (s): 0.43 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.306308E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.006 | TFLOPs: 31.22 | 7: iteration 56290/ 115203 | consumed samples: 14410240 | consumed tokens: 29512171520 | elapsed time per iteration (s): 0.44 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.311500E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.090 | TFLOPs: 30.75 | 7: iteration 56300/ 115203 | consumed samples: 14412800 | consumed tokens: 29517414400 | elapsed time per iteration (s): 0.43 | learning rate: 1.147E-04 | global batch size: 256 | lm loss: 2.283093E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.432 | TFLOPs: 31.08 | 7: iteration 56310/ 115203 | consumed samples: 14415360 | consumed tokens: 29522657280 | elapsed time per iteration (s): 0.43 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.297825E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.898 | TFLOPs: 31.00 | 7: iteration 56320/ 115203 | consumed samples: 14417920 | consumed tokens: 29527900160 | elapsed time per iteration (s): 0.44 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.289412E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.244 | TFLOPs: 30.81 | 7: iteration 56330/ 115203 | consumed samples: 14420480 | consumed tokens: 29533143040 | elapsed time per iteration (s): 0.44 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.251397E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.291 | TFLOPs: 30.87 | 7: iteration 56340/ 115203 | consumed samples: 14423040 | consumed tokens: 29538385920 | elapsed time per iteration (s): 0.44 | learning rate: 1.146E-04 | global batch size: 256 | lm loss: 2.304365E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.188 | TFLOPs: 30.55 | 7: iteration 56350/ 115203 | consumed samples: 14425600 | consumed tokens: 29543628800 | elapsed time per iteration (s): 0.43 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.301204E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.939 | TFLOPs: 31.01 | 7: iteration 56360/ 115203 | consumed samples: 14428160 | consumed tokens: 29548871680 | elapsed time per iteration (s): 0.43 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.300619E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.684 | TFLOPs: 31.46 | 7: iteration 56370/ 115203 | consumed samples: 14430720 | consumed tokens: 29554114560 | elapsed time per iteration (s): 0.44 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.306632E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.408 | TFLOPs: 30.77 | 7: iteration 56380/ 115203 | consumed samples: 14433280 | consumed tokens: 29559357440 | elapsed time per iteration (s): 0.44 | learning rate: 1.145E-04 | global batch size: 256 | lm loss: 2.265337E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.028 | TFLOPs: 30.85 | 7: iteration 56390/ 115203 | consumed samples: 14435840 | consumed tokens: 29564600320 | elapsed time per iteration (s): 0.45 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.297848E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.720 | TFLOPs: 29.94 | 7: iteration 56400/ 115203 | consumed samples: 14438400 | consumed tokens: 29569843200 | elapsed time per iteration (s): 0.44 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.278834E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.991 | TFLOPs: 30.69 | 7: iteration 56410/ 115203 | consumed samples: 14440960 | consumed tokens: 29575086080 | elapsed time per iteration (s): 0.43 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.298571E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.153 | TFLOPs: 31.12 | 7: iteration 56420/ 115203 | consumed samples: 14443520 | consumed tokens: 29580328960 | elapsed time per iteration (s): 0.44 | learning rate: 1.144E-04 | global batch size: 256 | lm loss: 2.298217E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.043 | TFLOPs: 30.75 | 7: iteration 56430/ 115203 | consumed samples: 14446080 | consumed tokens: 29585571840 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.285903E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.647 | TFLOPs: 31.31 | 7: iteration 56440/ 115203 | consumed samples: 14448640 | consumed tokens: 29590814720 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.274773E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.107 | TFLOPs: 31.12 | 7: iteration 56450/ 115203 | consumed samples: 14451200 | consumed tokens: 29596057600 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.260544E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.421 | TFLOPs: 31.56 | 7: iteration 56460/ 115203 | consumed samples: 14453760 | consumed tokens: 29601300480 | elapsed time per iteration (s): 0.43 | learning rate: 1.143E-04 | global batch size: 256 | lm loss: 2.270505E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.884 | TFLOPs: 31.42 | 7: iteration 56470/ 115203 | consumed samples: 14456320 | consumed tokens: 29606543360 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.292200E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.590 | TFLOPs: 31.56 | 7: iteration 56480/ 115203 | consumed samples: 14458880 | consumed tokens: 29611786240 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.272606E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.193 | TFLOPs: 31.23 | 7: iteration 56490/ 115203 | consumed samples: 14461440 | consumed tokens: 29617029120 | elapsed time per iteration (s): 0.43 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.279094E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.761 | TFLOPs: 31.47 | 7: iteration 56500/ 115203 | consumed samples: 14464000 | consumed tokens: 29622272000 | elapsed time per iteration (s): 0.42 | learning rate: 1.142E-04 | global batch size: 256 | lm loss: 2.291436E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.278 | TFLOPs: 31.86 | 7: iteration 56510/ 115203 | consumed samples: 14466560 | consumed tokens: 29627514880 | elapsed time per iteration (s): 0.44 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.290945E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.400 | TFLOPs: 30.51 | 7: iteration 56520/ 115203 | consumed samples: 14469120 | consumed tokens: 29632757760 | elapsed time per iteration (s): 0.44 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.292769E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.121 | TFLOPs: 30.44 | 7: iteration 56530/ 115203 | consumed samples: 14471680 | consumed tokens: 29638000640 | elapsed time per iteration (s): 0.43 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.274817E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.229 | TFLOPs: 31.18 | 7: iteration 56540/ 115203 | consumed samples: 14474240 | consumed tokens: 29643243520 | elapsed time per iteration (s): 0.43 | learning rate: 1.141E-04 | global batch size: 256 | lm loss: 2.301580E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.644 | TFLOPs: 31.30 | 7: iteration 56550/ 115203 | consumed samples: 14476800 | consumed tokens: 29648486400 | elapsed time per iteration (s): 0.45 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.288466E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.484 | TFLOPs: 30.14 | 7: iteration 56560/ 115203 | consumed samples: 14479360 | consumed tokens: 29653729280 | elapsed time per iteration (s): 0.43 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.291070E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.789 | TFLOPs: 31.10 | 7: iteration 56570/ 115203 | consumed samples: 14481920 | consumed tokens: 29658972160 | elapsed time per iteration (s): 0.44 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.301323E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.677 | TFLOPs: 30.20 | 7: iteration 56580/ 115203 | consumed samples: 14484480 | consumed tokens: 29664215040 | elapsed time per iteration (s): 0.43 | learning rate: 1.140E-04 | global batch size: 256 | lm loss: 2.295104E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.649 | TFLOPs: 31.25 | 7: iteration 56590/ 115203 | consumed samples: 14487040 | consumed tokens: 29669457920 | elapsed time per iteration (s): 0.43 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.319468E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.036 | TFLOPs: 30.91 | 7: iteration 56600/ 115203 | consumed samples: 14489600 | consumed tokens: 29674700800 | elapsed time per iteration (s): 0.44 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.277987E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.046 | TFLOPs: 30.54 | 7: iteration 56610/ 115203 | consumed samples: 14492160 | consumed tokens: 29679943680 | elapsed time per iteration (s): 0.44 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.265798E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.700 | TFLOPs: 30.36 | 7: iteration 56620/ 115203 | consumed samples: 14494720 | consumed tokens: 29685186560 | elapsed time per iteration (s): 0.44 | learning rate: 1.139E-04 | global batch size: 256 | lm loss: 2.301169E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.312 | TFLOPs: 30.40 | 7: iteration 56630/ 115203 | consumed samples: 14497280 | consumed tokens: 29690429440 | elapsed time per iteration (s): 0.44 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.286489E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.028 | TFLOPs: 30.75 | 7: iteration 56640/ 115203 | consumed samples: 14499840 | consumed tokens: 29695672320 | elapsed time per iteration (s): 0.43 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.278216E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.626 | TFLOPs: 30.88 | 7: iteration 56650/ 115203 | consumed samples: 14502400 | consumed tokens: 29700915200 | elapsed time per iteration (s): 0.44 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.283975E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.594 | TFLOPs: 30.52 | 7: iteration 56660/ 115203 | consumed samples: 14504960 | consumed tokens: 29706158080 | elapsed time per iteration (s): 0.42 | learning rate: 1.138E-04 | global batch size: 256 | lm loss: 2.317719E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.376 | TFLOPs: 31.71 | 7: iteration 56670/ 115203 | consumed samples: 14507520 | consumed tokens: 29711400960 | elapsed time per iteration (s): 0.43 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.295620E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.555 | TFLOPs: 31.35 | 7: iteration 56680/ 115203 | consumed samples: 14510080 | consumed tokens: 29716643840 | elapsed time per iteration (s): 0.43 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.279180E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.739 | TFLOPs: 31.26 | 7: iteration 56690/ 115203 | consumed samples: 14512640 | consumed tokens: 29721886720 | elapsed time per iteration (s): 0.45 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.289766E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.152 | TFLOPs: 29.81 | 7: iteration 56700/ 115203 | consumed samples: 14515200 | consumed tokens: 29727129600 | elapsed time per iteration (s): 0.43 | learning rate: 1.137E-04 | global batch size: 256 | lm loss: 2.295896E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.152 | TFLOPs: 31.12 | 7: iteration 56710/ 115203 | consumed samples: 14517760 | consumed tokens: 29732372480 | elapsed time per iteration (s): 0.43 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.268581E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.785 | TFLOPs: 31.05 | 7: iteration 56720/ 115203 | consumed samples: 14520320 | consumed tokens: 29737615360 | elapsed time per iteration (s): 0.44 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.299226E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.439 | TFLOPs: 30.77 | 7: iteration 56730/ 115203 | consumed samples: 14522880 | consumed tokens: 29742858240 | elapsed time per iteration (s): 0.45 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.282442E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.475 | TFLOPs: 29.93 | 7: iteration 56740/ 115203 | consumed samples: 14525440 | consumed tokens: 29748101120 | elapsed time per iteration (s): 0.43 | learning rate: 1.136E-04 | global batch size: 256 | lm loss: 2.290723E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.263 | TFLOPs: 31.44 | 7: iteration 56750/ 115203 | consumed samples: 14528000 | consumed tokens: 29753344000 | elapsed time per iteration (s): 0.45 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.249140E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.817 | TFLOPs: 29.90 | 7: iteration 56760/ 115203 | consumed samples: 14530560 | consumed tokens: 29758586880 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.274571E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.170 | TFLOPs: 31.02 | 7: iteration 56770/ 115203 | consumed samples: 14533120 | consumed tokens: 29763829760 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.288973E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.470 | TFLOPs: 31.09 | 7: iteration 56780/ 115203 | consumed samples: 14535680 | consumed tokens: 29769072640 | elapsed time per iteration (s): 0.43 | learning rate: 1.135E-04 | global batch size: 256 | lm loss: 2.287118E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.164 | TFLOPs: 31.33 | 7: iteration 56790/ 115203 | consumed samples: 14538240 | consumed tokens: 29774315520 | elapsed time per iteration (s): 0.44 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.275857E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.462 | TFLOPs: 30.88 | 7: iteration 56800/ 115203 | consumed samples: 14540800 | consumed tokens: 29779558400 | elapsed time per iteration (s): 0.43 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.315751E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.429 | TFLOPs: 31.50 | 7: iteration 56810/ 115203 | consumed samples: 14543360 | consumed tokens: 29784801280 | elapsed time per iteration (s): 0.43 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.268842E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.300 | TFLOPs: 31.08 | 7: iteration 56820/ 115203 | consumed samples: 14545920 | consumed tokens: 29790044160 | elapsed time per iteration (s): 0.43 | learning rate: 1.134E-04 | global batch size: 256 | lm loss: 2.243587E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.425 | TFLOPs: 31.35 | 7: iteration 56830/ 115203 | consumed samples: 14548480 | consumed tokens: 29795287040 | elapsed time per iteration (s): 0.43 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.279340E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.100 | TFLOPs: 31.38 | 7: iteration 56840/ 115203 | consumed samples: 14551040 | consumed tokens: 29800529920 | elapsed time per iteration (s): 0.44 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.282474E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.071 | TFLOPs: 30.59 | 7: iteration 56850/ 115203 | consumed samples: 14553600 | consumed tokens: 29805772800 | elapsed time per iteration (s): 0.43 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.253069E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.670 | TFLOPs: 31.04 | 7: iteration 56860/ 115203 | consumed samples: 14556160 | consumed tokens: 29811015680 | elapsed time per iteration (s): 0.42 | learning rate: 1.133E-04 | global batch size: 256 | lm loss: 2.321020E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.658 | TFLOPs: 31.73 | 7: iteration 56870/ 115203 | consumed samples: 14558720 | consumed tokens: 29816258560 | elapsed time per iteration (s): 0.45 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.280332E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.069 | TFLOPs: 29.54 | 7: iteration 56880/ 115203 | consumed samples: 14561280 | consumed tokens: 29821501440 | elapsed time per iteration (s): 0.43 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.259741E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.970 | TFLOPs: 31.27 | 7: iteration 56890/ 115203 | consumed samples: 14563840 | consumed tokens: 29826744320 | elapsed time per iteration (s): 0.43 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.273706E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.937 | TFLOPs: 31.11 | 7: iteration 56900/ 115203 | consumed samples: 14566400 | consumed tokens: 29831987200 | elapsed time per iteration (s): 0.43 | learning rate: 1.132E-04 | global batch size: 256 | lm loss: 2.260983E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.481 | TFLOPs: 31.45 | 7: iteration 56910/ 115203 | consumed samples: 14568960 | consumed tokens: 29837230080 | elapsed time per iteration (s): 0.45 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.310115E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.859 | TFLOPs: 29.90 | 7: iteration 56920/ 115203 | consumed samples: 14571520 | consumed tokens: 29842472960 | elapsed time per iteration (s): 0.43 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.295570E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.149 | TFLOPs: 31.12 | 7: iteration 56930/ 115203 | consumed samples: 14574080 | consumed tokens: 29847715840 | elapsed time per iteration (s): 0.43 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.276844E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.535 | TFLOPs: 31.04 | 7: iteration 56940/ 115203 | consumed samples: 14576640 | consumed tokens: 29852958720 | elapsed time per iteration (s): 0.44 | learning rate: 1.131E-04 | global batch size: 256 | lm loss: 2.300732E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.480 | TFLOPs: 30.56 | 7: iteration 56950/ 115203 | consumed samples: 14579200 | consumed tokens: 29858201600 | elapsed time per iteration (s): 0.44 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.280652E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.222 | TFLOPs: 30.86 | 7: iteration 56960/ 115203 | consumed samples: 14581760 | consumed tokens: 29863444480 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.260068E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.800 | TFLOPs: 31.10 | 7: iteration 56970/ 115203 | consumed samples: 14584320 | consumed tokens: 29868687360 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.289743E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.765 | TFLOPs: 30.89 | 7: iteration 56980/ 115203 | consumed samples: 14586880 | consumed tokens: 29873930240 | elapsed time per iteration (s): 0.43 | learning rate: 1.130E-04 | global batch size: 256 | lm loss: 2.285157E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.569 | TFLOPs: 31.25 | 7: iteration 56990/ 115203 | consumed samples: 14589440 | consumed tokens: 29879173120 | elapsed time per iteration (s): 0.43 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.270307E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.846 | TFLOPs: 31.53 | 7: iteration 57000/ 115203 | consumed samples: 14592000 | consumed tokens: 29884416000 | elapsed time per iteration (s): 0.43 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.287568E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.767 | TFLOPs: 31.15 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 57000 | lm loss value: 2.269639E+00 | lm loss PPL: 9.675903E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 57000 to checkpoints_221m 0: [2022-11-28 19:48:40,948] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step57000 is begin to save! 0: [2022-11-28 19:48:40,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:48:41,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:48:41,141] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:48:41,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:48:41,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:48:41,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:48:41,203] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:48:41,234] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:48:41,234] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:48:41,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:48:41,265] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:48:41,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:48:41,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:48:41,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:48:41,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:48:41,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:48:41,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:48:41,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:48:41,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:48:41,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:48:41,420] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:48:41,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:48:41,451] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:48:41,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:48:41,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:48:41,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:48:41,513] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:48:41,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:48:41,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:48:41,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:48:41,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:48:41,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:48:41,607] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:48:41,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:48:41,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:48:41,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:48:41,670] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:48:41,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:48:41,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:48:41,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:48:41,709] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step57000/mp_rank_00_model_states.pt 0: [2022-11-28 19:48:41,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:48:41,713] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:48:41,732] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:48:41,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,784] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,784] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 2: [2022-11-28 19:48:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:48:41,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,791] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,791] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,795] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:48:41,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,795] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:48:41,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 4: [2022-11-28 19:48:41,795] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,796] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,796] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,798] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,798] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 6: [2022-11-28 19:48:41,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:48:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,801] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,801] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:48:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,800] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 3: [2022-11-28 19:48:41,800] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 7: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 7: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,802] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,802] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,805] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:48:41,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,805] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,805] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:48:41,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,833] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 5: [2022-11-28 19:48:41,833] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: [2022-11-28 19:48:41,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:48:41,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step57000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 1: [2022-11-28 19:48:41,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step57000 is ready now! 0: successfully saved checkpoint at iteration 57000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1040.06 7: iteration 57010/ 115203 | consumed samples: 14594560 | consumed tokens: 29889658880 | elapsed time per iteration (s): 0.56 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.276433E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 457.480 | TFLOPs: 24.00 | 7: iteration 57020/ 115203 | consumed samples: 14597120 | consumed tokens: 29894901760 | elapsed time per iteration (s): 0.43 | learning rate: 1.129E-04 | global batch size: 256 | lm loss: 2.288572E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.370 | TFLOPs: 31.50 | 7: iteration 57030/ 115203 | consumed samples: 14599680 | consumed tokens: 29900144640 | elapsed time per iteration (s): 0.43 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.306626E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.858 | TFLOPs: 31.21 | 7: iteration 57040/ 115203 | consumed samples: 14602240 | consumed tokens: 29905387520 | elapsed time per iteration (s): 0.44 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.272441E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.639 | TFLOPs: 30.62 | 7: iteration 57050/ 115203 | consumed samples: 14604800 | consumed tokens: 29910630400 | elapsed time per iteration (s): 0.43 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.261988E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.543 | TFLOPs: 31.14 | 7: iteration 57060/ 115203 | consumed samples: 14607360 | consumed tokens: 29915873280 | elapsed time per iteration (s): 0.43 | learning rate: 1.128E-04 | global batch size: 256 | lm loss: 2.253013E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.625 | TFLOPs: 31.15 | 7: iteration 57070/ 115203 | consumed samples: 14609920 | consumed tokens: 29921116160 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.305357E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.351 | TFLOPs: 31.50 | 7: iteration 57080/ 115203 | consumed samples: 14612480 | consumed tokens: 29926359040 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.268031E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.205 | TFLOPs: 30.97 | 7: iteration 57090/ 115203 | consumed samples: 14615040 | consumed tokens: 29931601920 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.303273E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.121 | TFLOPs: 31.28 | 7: iteration 57100/ 115203 | consumed samples: 14617600 | consumed tokens: 29936844800 | elapsed time per iteration (s): 0.43 | learning rate: 1.127E-04 | global batch size: 256 | lm loss: 2.279767E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.022 | TFLOPs: 31.27 | 7: iteration 57110/ 115203 | consumed samples: 14620160 | consumed tokens: 29942087680 | elapsed time per iteration (s): 0.43 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.272227E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.387 | TFLOPs: 30.92 | 7: iteration 57120/ 115203 | consumed samples: 14622720 | consumed tokens: 29947330560 | elapsed time per iteration (s): 0.44 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.295381E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.672 | TFLOPs: 30.83 | 7: iteration 57130/ 115203 | consumed samples: 14625280 | consumed tokens: 29952573440 | elapsed time per iteration (s): 0.44 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.272783E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.549 | TFLOPs: 30.62 | 7: iteration 57140/ 115203 | consumed samples: 14627840 | consumed tokens: 29957816320 | elapsed time per iteration (s): 0.44 | learning rate: 1.126E-04 | global batch size: 256 | lm loss: 2.276233E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.151 | TFLOPs: 30.86 | 7: iteration 57150/ 115203 | consumed samples: 14630400 | consumed tokens: 29963059200 | elapsed time per iteration (s): 0.43 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.259883E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.670 | TFLOPs: 31.31 | 7: iteration 57160/ 115203 | consumed samples: 14632960 | consumed tokens: 29968302080 | elapsed time per iteration (s): 0.43 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.302780E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.914 | TFLOPs: 31.11 | 7: iteration 57170/ 115203 | consumed samples: 14635520 | consumed tokens: 29973544960 | elapsed time per iteration (s): 0.43 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.290127E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.019 | TFLOPs: 31.01 | 7: iteration 57180/ 115203 | consumed samples: 14638080 | consumed tokens: 29978787840 | elapsed time per iteration (s): 0.44 | learning rate: 1.125E-04 | global batch size: 256 | lm loss: 2.301105E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.900 | TFLOPs: 30.85 | 7: iteration 57190/ 115203 | consumed samples: 14640640 | consumed tokens: 29984030720 | elapsed time per iteration (s): 0.43 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.271769E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.343 | TFLOPs: 30.97 | 7: iteration 57200/ 115203 | consumed samples: 14643200 | consumed tokens: 29989273600 | elapsed time per iteration (s): 0.44 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.283613E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.097 | TFLOPs: 30.86 | 7: iteration 57210/ 115203 | consumed samples: 14645760 | consumed tokens: 29994516480 | elapsed time per iteration (s): 0.44 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.304281E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.391 | TFLOPs: 30.61 | 7: iteration 57220/ 115203 | consumed samples: 14648320 | consumed tokens: 29999759360 | elapsed time per iteration (s): 0.45 | learning rate: 1.124E-04 | global batch size: 256 | lm loss: 2.248837E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.341 | TFLOPs: 29.82 | 7: iteration 57230/ 115203 | consumed samples: 14650880 | consumed tokens: 30005002240 | elapsed time per iteration (s): 0.44 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.295458E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.224 | TFLOPs: 30.39 | 7: iteration 57240/ 115203 | consumed samples: 14653440 | consumed tokens: 30010245120 | elapsed time per iteration (s): 0.44 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.334761E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.026 | TFLOPs: 30.80 | 7: iteration 57250/ 115203 | consumed samples: 14656000 | consumed tokens: 30015488000 | elapsed time per iteration (s): 0.43 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.308031E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.289 | TFLOPs: 30.97 | 7: iteration 57260/ 115203 | consumed samples: 14658560 | consumed tokens: 30020730880 | elapsed time per iteration (s): 0.44 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.315684E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.312 | TFLOPs: 30.24 | 7: iteration 57270/ 115203 | consumed samples: 14661120 | consumed tokens: 30025973760 | elapsed time per iteration (s): 0.43 | learning rate: 1.123E-04 | global batch size: 256 | lm loss: 2.249293E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.807 | TFLOPs: 31.21 | 7: iteration 57280/ 115203 | consumed samples: 14663680 | consumed tokens: 30031216640 | elapsed time per iteration (s): 0.43 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.232551E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.491 | TFLOPs: 31.45 | 7: iteration 57290/ 115203 | consumed samples: 14666240 | consumed tokens: 30036459520 | elapsed time per iteration (s): 0.43 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.290878E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.098 | TFLOPs: 31.07 | 7: iteration 57300/ 115203 | consumed samples: 14668800 | consumed tokens: 30041702400 | elapsed time per iteration (s): 0.44 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.278691E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.138 | TFLOPs: 30.65 | 7: iteration 57310/ 115203 | consumed samples: 14671360 | consumed tokens: 30046945280 | elapsed time per iteration (s): 0.43 | learning rate: 1.122E-04 | global batch size: 256 | lm loss: 2.290251E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.869 | TFLOPs: 31.58 | 7: iteration 57320/ 115203 | consumed samples: 14673920 | consumed tokens: 30052188160 | elapsed time per iteration (s): 0.44 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.304326E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.186 | TFLOPs: 30.65 | 7: iteration 57330/ 115203 | consumed samples: 14676480 | consumed tokens: 30057431040 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.249238E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.117 | TFLOPs: 31.43 | 7: iteration 57340/ 115203 | consumed samples: 14679040 | consumed tokens: 30062673920 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.283744E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.573 | TFLOPs: 31.25 | 7: iteration 57350/ 115203 | consumed samples: 14681600 | consumed tokens: 30067916800 | elapsed time per iteration (s): 0.43 | learning rate: 1.121E-04 | global batch size: 256 | lm loss: 2.263785E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.611 | TFLOPs: 31.09 | 7: iteration 57360/ 115203 | consumed samples: 14684160 | consumed tokens: 30073159680 | elapsed time per iteration (s): 0.44 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.271418E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.517 | TFLOPs: 30.46 | 7: iteration 57370/ 115203 | consumed samples: 14686720 | consumed tokens: 30078402560 | elapsed time per iteration (s): 0.44 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.271465E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.333 | TFLOPs: 30.82 | 7: iteration 57380/ 115203 | consumed samples: 14689280 | consumed tokens: 30083645440 | elapsed time per iteration (s): 0.43 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.322499E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.003 | TFLOPs: 31.06 | 7: iteration 57390/ 115203 | consumed samples: 14691840 | consumed tokens: 30088888320 | elapsed time per iteration (s): 0.43 | learning rate: 1.120E-04 | global batch size: 256 | lm loss: 2.273349E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.282 | TFLOPs: 31.39 | 7: iteration 57400/ 115203 | consumed samples: 14694400 | consumed tokens: 30094131200 | elapsed time per iteration (s): 0.43 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.302629E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.249 | TFLOPs: 31.02 | 7: iteration 57410/ 115203 | consumed samples: 14696960 | consumed tokens: 30099374080 | elapsed time per iteration (s): 0.44 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.313197E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.795 | TFLOPs: 30.26 | 7: iteration 57420/ 115203 | consumed samples: 14699520 | consumed tokens: 30104616960 | elapsed time per iteration (s): 0.45 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.296468E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.796 | TFLOPs: 30.05 | 7: iteration 57430/ 115203 | consumed samples: 14702080 | consumed tokens: 30109859840 | elapsed time per iteration (s): 0.44 | learning rate: 1.119E-04 | global batch size: 256 | lm loss: 2.285206E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.349 | TFLOPs: 30.35 | 7: iteration 57440/ 115203 | consumed samples: 14704640 | consumed tokens: 30115102720 | elapsed time per iteration (s): 0.43 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.248277E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.279 | TFLOPs: 31.23 | 7: iteration 57450/ 115203 | consumed samples: 14707200 | consumed tokens: 30120345600 | elapsed time per iteration (s): 0.42 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.283799E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.715 | TFLOPs: 31.73 | 7: iteration 57460/ 115203 | consumed samples: 14709760 | consumed tokens: 30125588480 | elapsed time per iteration (s): 0.43 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.289533E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.466 | TFLOPs: 31.45 | 7: iteration 57470/ 115203 | consumed samples: 14712320 | consumed tokens: 30130831360 | elapsed time per iteration (s): 0.44 | learning rate: 1.118E-04 | global batch size: 256 | lm loss: 2.272931E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.578 | TFLOPs: 30.72 | 7: iteration 57480/ 115203 | consumed samples: 14714880 | consumed tokens: 30136074240 | elapsed time per iteration (s): 0.43 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.289946E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.710 | TFLOPs: 31.15 | 7: iteration 57490/ 115203 | consumed samples: 14717440 | consumed tokens: 30141317120 | elapsed time per iteration (s): 0.44 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.292114E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.652 | TFLOPs: 30.78 | 7: iteration 57500/ 115203 | consumed samples: 14720000 | consumed tokens: 30146560000 | elapsed time per iteration (s): 0.43 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.318208E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.900 | TFLOPs: 31.42 | 7: iteration 57510/ 115203 | consumed samples: 14722560 | consumed tokens: 30151802880 | elapsed time per iteration (s): 0.44 | learning rate: 1.117E-04 | global batch size: 256 | lm loss: 2.282629E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.468 | TFLOPs: 30.77 | 7: iteration 57520/ 115203 | consumed samples: 14725120 | consumed tokens: 30157045760 | elapsed time per iteration (s): 0.43 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.292579E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.921 | TFLOPs: 31.16 | 7: iteration 57530/ 115203 | consumed samples: 14727680 | consumed tokens: 30162288640 | elapsed time per iteration (s): 0.43 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.309712E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.430 | TFLOPs: 31.45 | 7: iteration 57540/ 115203 | consumed samples: 14730240 | consumed tokens: 30167531520 | elapsed time per iteration (s): 0.43 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.288893E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.366 | TFLOPs: 30.92 | 7: iteration 57550/ 115203 | consumed samples: 14732800 | consumed tokens: 30172774400 | elapsed time per iteration (s): 0.42 | learning rate: 1.116E-04 | global batch size: 256 | lm loss: 2.305996E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.625 | TFLOPs: 31.62 | 7: iteration 57560/ 115203 | consumed samples: 14735360 | consumed tokens: 30178017280 | elapsed time per iteration (s): 0.44 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.251031E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.182 | TFLOPs: 30.28 | 7: iteration 57570/ 115203 | consumed samples: 14737920 | consumed tokens: 30183260160 | elapsed time per iteration (s): 0.43 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.294610E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.304 | TFLOPs: 31.34 | 7: iteration 57580/ 115203 | consumed samples: 14740480 | consumed tokens: 30188503040 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.289820E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.721 | TFLOPs: 31.62 | 7: iteration 57590/ 115203 | consumed samples: 14743040 | consumed tokens: 30193745920 | elapsed time per iteration (s): 0.42 | learning rate: 1.115E-04 | global batch size: 256 | lm loss: 2.281709E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.370 | TFLOPs: 31.82 | 7: iteration 57600/ 115203 | consumed samples: 14745600 | consumed tokens: 30198988800 | elapsed time per iteration (s): 0.43 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.279910E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.381 | TFLOPs: 30.98 | 7: iteration 57610/ 115203 | consumed samples: 14748160 | consumed tokens: 30204231680 | elapsed time per iteration (s): 0.43 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.264407E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.459 | TFLOPs: 31.24 | 7: iteration 57620/ 115203 | consumed samples: 14750720 | consumed tokens: 30209474560 | elapsed time per iteration (s): 0.43 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.289566E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.164 | TFLOPs: 31.49 | 7: iteration 57630/ 115203 | consumed samples: 14753280 | consumed tokens: 30214717440 | elapsed time per iteration (s): 0.43 | learning rate: 1.114E-04 | global batch size: 256 | lm loss: 2.261087E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.778 | TFLOPs: 31.10 | 7: iteration 57640/ 115203 | consumed samples: 14755840 | consumed tokens: 30219960320 | elapsed time per iteration (s): 0.44 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.296367E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.377 | TFLOPs: 30.82 | 7: iteration 57650/ 115203 | consumed samples: 14758400 | consumed tokens: 30225203200 | elapsed time per iteration (s): 0.44 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.260298E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.012 | TFLOPs: 30.69 | 7: iteration 57660/ 115203 | consumed samples: 14760960 | consumed tokens: 30230446080 | elapsed time per iteration (s): 0.43 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.294411E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.125 | TFLOPs: 30.91 | 7: iteration 57670/ 115203 | consumed samples: 14763520 | consumed tokens: 30235688960 | elapsed time per iteration (s): 0.43 | learning rate: 1.113E-04 | global batch size: 256 | lm loss: 2.271573E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.768 | TFLOPs: 31.36 | 7: iteration 57680/ 115203 | consumed samples: 14766080 | consumed tokens: 30240931840 | elapsed time per iteration (s): 0.42 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.278908E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.826 | TFLOPs: 31.68 | 7: iteration 57690/ 115203 | consumed samples: 14768640 | consumed tokens: 30246174720 | elapsed time per iteration (s): 0.42 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.279325E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.373 | TFLOPs: 31.61 | 7: iteration 57700/ 115203 | consumed samples: 14771200 | consumed tokens: 30251417600 | elapsed time per iteration (s): 0.43 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.339418E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.523 | TFLOPs: 31.09 | 7: iteration 57710/ 115203 | consumed samples: 14773760 | consumed tokens: 30256660480 | elapsed time per iteration (s): 0.44 | learning rate: 1.112E-04 | global batch size: 256 | lm loss: 2.287646E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.290 | TFLOPs: 30.55 | 7: iteration 57720/ 115203 | consumed samples: 14776320 | consumed tokens: 30261903360 | elapsed time per iteration (s): 0.42 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.279868E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.976 | TFLOPs: 31.85 | 7: iteration 57730/ 115203 | consumed samples: 14778880 | consumed tokens: 30267146240 | elapsed time per iteration (s): 0.43 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.259352E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.806 | TFLOPs: 31.21 | 7: iteration 57740/ 115203 | consumed samples: 14781440 | consumed tokens: 30272389120 | elapsed time per iteration (s): 0.43 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.291101E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.578 | TFLOPs: 31.09 | 7: iteration 57750/ 115203 | consumed samples: 14784000 | consumed tokens: 30277632000 | elapsed time per iteration (s): 0.43 | learning rate: 1.111E-04 | global batch size: 256 | lm loss: 2.294534E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.776 | TFLOPs: 31.21 | 7: iteration 57760/ 115203 | consumed samples: 14786560 | consumed tokens: 30282874880 | elapsed time per iteration (s): 0.43 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.295742E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.176 | TFLOPs: 31.39 | 7: iteration 57770/ 115203 | consumed samples: 14789120 | consumed tokens: 30288117760 | elapsed time per iteration (s): 0.43 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.273935E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.599 | TFLOPs: 31.09 | 7: iteration 57780/ 115203 | consumed samples: 14791680 | consumed tokens: 30293360640 | elapsed time per iteration (s): 0.43 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.271293E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.981 | TFLOPs: 31.38 | 7: iteration 57790/ 115203 | consumed samples: 14794240 | consumed tokens: 30298603520 | elapsed time per iteration (s): 0.42 | learning rate: 1.110E-04 | global batch size: 256 | lm loss: 2.284623E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.003 | TFLOPs: 31.85 | 7: iteration 57800/ 115203 | consumed samples: 14796800 | consumed tokens: 30303846400 | elapsed time per iteration (s): 0.44 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.298568E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.778 | TFLOPs: 30.84 | 7: iteration 57810/ 115203 | consumed samples: 14799360 | consumed tokens: 30309089280 | elapsed time per iteration (s): 0.44 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.269817E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.971 | TFLOPs: 30.64 | 7: iteration 57820/ 115203 | consumed samples: 14801920 | consumed tokens: 30314332160 | elapsed time per iteration (s): 0.43 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.288298E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.586 | TFLOPs: 31.30 | 7: iteration 57830/ 115203 | consumed samples: 14804480 | consumed tokens: 30319575040 | elapsed time per iteration (s): 0.43 | learning rate: 1.109E-04 | global batch size: 256 | lm loss: 2.280617E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.820 | TFLOPs: 31.47 | 7: iteration 57840/ 115203 | consumed samples: 14807040 | consumed tokens: 30324817920 | elapsed time per iteration (s): 0.43 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.242299E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.749 | TFLOPs: 31.26 | 7: iteration 57850/ 115203 | consumed samples: 14809600 | consumed tokens: 30330060800 | elapsed time per iteration (s): 0.43 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.270977E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.279 | TFLOPs: 31.39 | 7: iteration 57860/ 115203 | consumed samples: 14812160 | consumed tokens: 30335303680 | elapsed time per iteration (s): 0.43 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.295352E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.172 | TFLOPs: 31.28 | 7: iteration 57870/ 115203 | consumed samples: 14814720 | consumed tokens: 30340546560 | elapsed time per iteration (s): 0.43 | learning rate: 1.108E-04 | global batch size: 256 | lm loss: 2.293237E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.195 | TFLOPs: 31.33 | 7: iteration 57880/ 115203 | consumed samples: 14817280 | consumed tokens: 30345789440 | elapsed time per iteration (s): 0.44 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.268998E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.827 | TFLOPs: 30.84 | 7: iteration 57890/ 115203 | consumed samples: 14819840 | consumed tokens: 30351032320 | elapsed time per iteration (s): 0.43 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.271839E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.039 | TFLOPs: 31.38 | 7: iteration 57900/ 115203 | consumed samples: 14822400 | consumed tokens: 30356275200 | elapsed time per iteration (s): 0.44 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.273645E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.920 | TFLOPs: 30.43 | 7: iteration 57910/ 115203 | consumed samples: 14824960 | consumed tokens: 30361518080 | elapsed time per iteration (s): 0.44 | learning rate: 1.107E-04 | global batch size: 256 | lm loss: 2.285333E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.965 | TFLOPs: 30.85 | 7: iteration 57920/ 115203 | consumed samples: 14827520 | consumed tokens: 30366760960 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.282877E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.775 | TFLOPs: 31.52 | 7: iteration 57930/ 115203 | consumed samples: 14830080 | consumed tokens: 30372003840 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.298611E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.657 | TFLOPs: 31.15 | 7: iteration 57940/ 115203 | consumed samples: 14832640 | consumed tokens: 30377246720 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.284869E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.895 | TFLOPs: 31.00 | 7: iteration 57950/ 115203 | consumed samples: 14835200 | consumed tokens: 30382489600 | elapsed time per iteration (s): 0.43 | learning rate: 1.106E-04 | global batch size: 256 | lm loss: 2.274619E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.317 | TFLOPs: 31.29 | 7: iteration 57960/ 115203 | consumed samples: 14837760 | consumed tokens: 30387732480 | elapsed time per iteration (s): 0.43 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.284001E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.446 | TFLOPs: 31.50 | 7: iteration 57970/ 115203 | consumed samples: 14840320 | consumed tokens: 30392975360 | elapsed time per iteration (s): 0.43 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.286703E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.088 | TFLOPs: 31.01 | 7: iteration 57980/ 115203 | consumed samples: 14842880 | consumed tokens: 30398218240 | elapsed time per iteration (s): 0.43 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.251328E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.121 | TFLOPs: 31.12 | 7: iteration 57990/ 115203 | consumed samples: 14845440 | consumed tokens: 30403461120 | elapsed time per iteration (s): 0.43 | learning rate: 1.105E-04 | global batch size: 256 | lm loss: 2.303079E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.243 | TFLOPs: 31.07 | 0: [2022-11-28 19:55:54,375] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=0, lr=[0.00011044114819593482, 0.00011044114819593482, 0.00011044114819593482], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 58000/ 115203 | consumed samples: 14848000 | consumed tokens: 30408704000 | elapsed time per iteration (s): 0.43 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.311122E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.803 | TFLOPs: 31.00 | 0: steps: 58000 loss: 2.2662 iter time (s): 0.432 samples/sec: 592.004 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 58000 | lm loss value: 2.196148E+00 | lm loss PPL: 8.990320E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 58000 to checkpoints_221m 0: [2022-11-28 19:55:54,561] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step58000 is begin to save! 0: [2022-11-28 19:55:54,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_01-model_00-model_states.pt... 0: [2022-11-28 19:55:54,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_01-model_00-model_states.pt. 0: [2022-11-28 19:55:54,719] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_03-model_00-model_states.pt... 0: [2022-11-28 19:55:54,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_03-model_00-model_states.pt. 0: [2022-11-28 19:55:54,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_04-model_00-model_states.pt... 0: [2022-11-28 19:55:54,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_04-model_00-model_states.pt. 0: [2022-11-28 19:55:54,781] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_05-model_00-model_states.pt... 0: [2022-11-28 19:55:54,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_05-model_00-model_states.pt. 0: [2022-11-28 19:55:54,812] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_06-model_00-model_states.pt... 0: [2022-11-28 19:55:54,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_06-model_00-model_states.pt. 0: [2022-11-28 19:55:54,845] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_07-model_00-model_states.pt... 0: [2022-11-28 19:55:54,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_07-model_00-model_states.pt. 0: [2022-11-28 19:55:54,880] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_08-model_00-model_states.pt... 0: [2022-11-28 19:55:54,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_08-model_00-model_states.pt. 0: [2022-11-28 19:55:54,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_09-model_00-model_states.pt... 0: [2022-11-28 19:55:54,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_09-model_00-model_states.pt. 0: [2022-11-28 19:55:54,941] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_10-model_00-model_states.pt... 0: [2022-11-28 19:55:54,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_10-model_00-model_states.pt. 0: [2022-11-28 19:55:54,973] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_11-model_00-model_states.pt... 0: [2022-11-28 19:55:55,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_11-model_00-model_states.pt. 0: [2022-11-28 19:55:55,005] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_12-model_00-model_states.pt... 0: [2022-11-28 19:55:55,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_12-model_00-model_states.pt. 0: [2022-11-28 19:55:55,036] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_13-model_00-model_states.pt... 0: [2022-11-28 19:55:55,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_13-model_00-model_states.pt. 0: [2022-11-28 19:55:55,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_14-model_00-model_states.pt... 0: [2022-11-28 19:55:55,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_14-model_00-model_states.pt. 0: [2022-11-28 19:55:55,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_15-model_00-model_states.pt... 0: [2022-11-28 19:55:55,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_15-model_00-model_states.pt. 0: [2022-11-28 19:55:55,131] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_16-model_00-model_states.pt... 0: [2022-11-28 19:55:55,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_16-model_00-model_states.pt. 0: [2022-11-28 19:55:55,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_17-model_00-model_states.pt... 0: [2022-11-28 19:55:55,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_17-model_00-model_states.pt. 0: [2022-11-28 19:55:55,195] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_18-model_00-model_states.pt... 0: [2022-11-28 19:55:55,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_18-model_00-model_states.pt. 0: [2022-11-28 19:55:55,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_19-model_00-model_states.pt... 0: [2022-11-28 19:55:55,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_19-model_00-model_states.pt. 0: [2022-11-28 19:55:55,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_20-model_00-model_states.pt... 0: [2022-11-28 19:55:55,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_20-model_00-model_states.pt. 0: [2022-11-28 19:55:55,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/layer_22-model_00-model_states.pt... 0: [2022-11-28 19:55:55,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/layer_22-model_00-model_states.pt. 0: [2022-11-28 19:55:55,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step58000/mp_rank_00_model_states.pt 0: [2022-11-28 19:55:55,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/mp_rank_00_model_states.pt... 0: [2022-11-28 19:55:55,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/mp_rank_00_model_states.pt. 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-28 19:55:55,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 4: [2022-11-28 19:55:55,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,367] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,367] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,369] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,369] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,370] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,370] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 1: [2022-11-28 19:55:55,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 4: [2022-11-28 19:55:55,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 19:55:55,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 7: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 1: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 2: [2022-11-28 19:55:55,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 19:55:55,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 19:55:55,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 19:55:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 19:55:55,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 5: [2022-11-28 19:55:55,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 3: [2022-11-28 19:55:55,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 19:55:55,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 19:55:55,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: [2022-11-28 19:55:55,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step58000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 19:55:55,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step58000 is ready now! 0: successfully saved checkpoint at iteration 58000 to checkpoints_221m 7: time (ms) | save-checkpoint: 894.86 7: iteration 58010/ 115203 | consumed samples: 14850560 | consumed tokens: 30413946880 | elapsed time per iteration (s): 0.53 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.255607E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.597 | TFLOPs: 25.37 | 7: iteration 58020/ 115203 | consumed samples: 14853120 | consumed tokens: 30419189760 | elapsed time per iteration (s): 0.43 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.238777E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.098 | TFLOPs: 31.28 | 7: iteration 58030/ 115203 | consumed samples: 14855680 | consumed tokens: 30424432640 | elapsed time per iteration (s): 0.43 | learning rate: 1.104E-04 | global batch size: 256 | lm loss: 2.264911E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.642 | TFLOPs: 31.04 | 7: iteration 58040/ 115203 | consumed samples: 14858240 | consumed tokens: 30429675520 | elapsed time per iteration (s): 0.43 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.291977E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.737 | TFLOPs: 31.57 | 7: iteration 58050/ 115203 | consumed samples: 14860800 | consumed tokens: 30434918400 | elapsed time per iteration (s): 0.42 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.305727E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.652 | TFLOPs: 31.93 | 7: iteration 58060/ 115203 | consumed samples: 14863360 | consumed tokens: 30440161280 | elapsed time per iteration (s): 0.43 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.285230E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.085 | TFLOPs: 31.17 | 7: iteration 58070/ 115203 | consumed samples: 14865920 | consumed tokens: 30445404160 | elapsed time per iteration (s): 0.45 | learning rate: 1.103E-04 | global batch size: 256 | lm loss: 2.254832E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.627 | TFLOPs: 29.89 | 7: iteration 58080/ 115203 | consumed samples: 14868480 | consumed tokens: 30450647040 | elapsed time per iteration (s): 0.43 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.291801E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.694 | TFLOPs: 31.15 | 7: iteration 58090/ 115203 | consumed samples: 14871040 | consumed tokens: 30455889920 | elapsed time per iteration (s): 0.43 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.305043E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.894 | TFLOPs: 31.58 | 7: iteration 58100/ 115203 | consumed samples: 14873600 | consumed tokens: 30461132800 | elapsed time per iteration (s): 0.43 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.301803E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.566 | TFLOPs: 31.25 | 7: iteration 58110/ 115203 | consumed samples: 14876160 | consumed tokens: 30466375680 | elapsed time per iteration (s): 0.44 | learning rate: 1.102E-04 | global batch size: 256 | lm loss: 2.255290E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.741 | TFLOPs: 30.42 | 7: iteration 58120/ 115203 | consumed samples: 14878720 | consumed tokens: 30471618560 | elapsed time per iteration (s): 0.43 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.278760E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.207 | TFLOPs: 31.54 | 7: iteration 58130/ 115203 | consumed samples: 14881280 | consumed tokens: 30476861440 | elapsed time per iteration (s): 0.42 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.259245E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.633 | TFLOPs: 31.83 | 7: iteration 58140/ 115203 | consumed samples: 14883840 | consumed tokens: 30482104320 | elapsed time per iteration (s): 0.43 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.303651E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.145 | TFLOPs: 31.59 | 7: iteration 58150/ 115203 | consumed samples: 14886400 | consumed tokens: 30487347200 | elapsed time per iteration (s): 0.43 | learning rate: 1.101E-04 | global batch size: 256 | lm loss: 2.249298E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.552 | TFLOPs: 31.56 | 7: iteration 58160/ 115203 | consumed samples: 14888960 | consumed tokens: 30492590080 | elapsed time per iteration (s): 0.43 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.297336E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.903 | TFLOPs: 31.37 | 7: iteration 58170/ 115203 | consumed samples: 14891520 | consumed tokens: 30497832960 | elapsed time per iteration (s): 0.42 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.265007E+00 | grad norm: 0.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.627 | TFLOPs: 31.67 | 7: iteration 58180/ 115203 | consumed samples: 14894080 | consumed tokens: 30503075840 | elapsed time per iteration (s): 0.44 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.245670E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.597 | TFLOPs: 30.36 | 7: iteration 58190/ 115203 | consumed samples: 14896640 | consumed tokens: 30508318720 | elapsed time per iteration (s): 0.43 | learning rate: 1.100E-04 | global batch size: 256 | lm loss: 2.323879E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.545 | TFLOPs: 31.40 | 7: iteration 58200/ 115203 | consumed samples: 14899200 | consumed tokens: 30513561600 | elapsed time per iteration (s): 0.43 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.294334E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.981 | TFLOPs: 31.53 | 7: iteration 58210/ 115203 | consumed samples: 14901760 | consumed tokens: 30518804480 | elapsed time per iteration (s): 0.43 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.269873E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.630 | TFLOPs: 31.25 | 7: iteration 58220/ 115203 | consumed samples: 14904320 | consumed tokens: 30524047360 | elapsed time per iteration (s): 0.43 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.274338E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.416 | TFLOPs: 31.45 | 7: iteration 58230/ 115203 | consumed samples: 14906880 | consumed tokens: 30529290240 | elapsed time per iteration (s): 0.43 | learning rate: 1.099E-04 | global batch size: 256 | lm loss: 2.307516E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.728 | TFLOPs: 31.47 | 7: iteration 58240/ 115203 | consumed samples: 14909440 | consumed tokens: 30534533120 | elapsed time per iteration (s): 0.43 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.283514E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.165 | TFLOPs: 31.33 | 7: iteration 58250/ 115203 | consumed samples: 14912000 | consumed tokens: 30539776000 | elapsed time per iteration (s): 0.42 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.302155E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.565 | TFLOPs: 31.72 | 7: iteration 58260/ 115203 | consumed samples: 14914560 | consumed tokens: 30545018880 | elapsed time per iteration (s): 0.43 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.285171E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.153 | TFLOPs: 31.12 | 7: iteration 58270/ 115203 | consumed samples: 14917120 | consumed tokens: 30550261760 | elapsed time per iteration (s): 0.43 | learning rate: 1.098E-04 | global batch size: 256 | lm loss: 2.228489E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.850 | TFLOPs: 31.00 | 7: iteration 58280/ 115203 | consumed samples: 14919680 | consumed tokens: 30555504640 | elapsed time per iteration (s): 0.43 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.236630E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.304 | TFLOPs: 31.29 | 7: iteration 58290/ 115203 | consumed samples: 14922240 | consumed tokens: 30560747520 | elapsed time per iteration (s): 0.43 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.311482E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.653 | TFLOPs: 31.31 | 7: iteration 58300/ 115203 | consumed samples: 14924800 | consumed tokens: 30565990400 | elapsed time per iteration (s): 0.43 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.256702E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.823 | TFLOPs: 31.47 | 7: iteration 58310/ 115203 | consumed samples: 14927360 | consumed tokens: 30571233280 | elapsed time per iteration (s): 0.43 | learning rate: 1.097E-04 | global batch size: 256 | lm loss: 2.282874E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.482 | TFLOPs: 31.24 | 7: iteration 58320/ 115203 | consumed samples: 14929920 | consumed tokens: 30576476160 | elapsed time per iteration (s): 0.44 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.291768E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.724 | TFLOPs: 30.63 | 7: iteration 58330/ 115203 | consumed samples: 14932480 | consumed tokens: 30581719040 | elapsed time per iteration (s): 0.43 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.259519E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.501 | TFLOPs: 30.98 | 7: iteration 58340/ 115203 | consumed samples: 14935040 | consumed tokens: 30586961920 | elapsed time per iteration (s): 0.43 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.244956E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.614 | TFLOPs: 31.51 | 7: iteration 58350/ 115203 | consumed samples: 14937600 | consumed tokens: 30592204800 | elapsed time per iteration (s): 0.42 | learning rate: 1.096E-04 | global batch size: 256 | lm loss: 2.290270E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.570 | TFLOPs: 31.67 | 7: iteration 58360/ 115203 | consumed samples: 14940160 | consumed tokens: 30597447680 | elapsed time per iteration (s): 0.43 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.299739E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.086 | TFLOPs: 31.22 | 7: iteration 58370/ 115203 | consumed samples: 14942720 | consumed tokens: 30602690560 | elapsed time per iteration (s): 0.43 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.310802E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.187 | TFLOPs: 31.18 | 7: iteration 58380/ 115203 | consumed samples: 14945280 | consumed tokens: 30607933440 | elapsed time per iteration (s): 0.43 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.253046E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.691 | TFLOPs: 31.25 | 7: iteration 58390/ 115203 | consumed samples: 14947840 | consumed tokens: 30613176320 | elapsed time per iteration (s): 0.42 | learning rate: 1.095E-04 | global batch size: 256 | lm loss: 2.281273E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.466 | TFLOPs: 31.77 | 7: iteration 58400/ 115203 | consumed samples: 14950400 | consumed tokens: 30618419200 | elapsed time per iteration (s): 0.44 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.259888E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.059 | TFLOPs: 30.49 | 7: iteration 58410/ 115203 | consumed samples: 14952960 | consumed tokens: 30623662080 | elapsed time per iteration (s): 0.43 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.293462E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.460 | TFLOPs: 30.93 | 7: iteration 58420/ 115203 | consumed samples: 14955520 | consumed tokens: 30628904960 | elapsed time per iteration (s): 0.44 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.309150E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.764 | TFLOPs: 30.73 | 7: iteration 58430/ 115203 | consumed samples: 14958080 | consumed tokens: 30634147840 | elapsed time per iteration (s): 0.43 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.258264E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.612 | TFLOPs: 31.20 | 7: iteration 58440/ 115203 | consumed samples: 14960640 | consumed tokens: 30639390720 | elapsed time per iteration (s): 0.44 | learning rate: 1.094E-04 | global batch size: 256 | lm loss: 2.264596E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.852 | TFLOPs: 30.21 | 7: iteration 58450/ 115203 | consumed samples: 14963200 | consumed tokens: 30644633600 | elapsed time per iteration (s): 0.43 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.284142E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.315 | TFLOPs: 31.60 | 7: iteration 58460/ 115203 | consumed samples: 14965760 | consumed tokens: 30649876480 | elapsed time per iteration (s): 0.45 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.284218E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.082 | TFLOPs: 30.07 | 7: iteration 58470/ 115203 | consumed samples: 14968320 | consumed tokens: 30655119360 | elapsed time per iteration (s): 0.44 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.290973E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.634 | TFLOPs: 30.52 | 7: iteration 58480/ 115203 | consumed samples: 14970880 | consumed tokens: 30660362240 | elapsed time per iteration (s): 0.43 | learning rate: 1.093E-04 | global batch size: 256 | lm loss: 2.263698E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.684 | TFLOPs: 31.15 | 7: iteration 58490/ 115203 | consumed samples: 14973440 | consumed tokens: 30665605120 | elapsed time per iteration (s): 0.44 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.278664E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.399 | TFLOPs: 30.40 | 7: iteration 58500/ 115203 | consumed samples: 14976000 | consumed tokens: 30670848000 | elapsed time per iteration (s): 0.43 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.275581E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.143 | TFLOPs: 31.07 | 7: iteration 58510/ 115203 | consumed samples: 14978560 | consumed tokens: 30676090880 | elapsed time per iteration (s): 0.43 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.276195E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.797 | TFLOPs: 31.47 | 7: iteration 58520/ 115203 | consumed samples: 14981120 | consumed tokens: 30681333760 | elapsed time per iteration (s): 0.43 | learning rate: 1.092E-04 | global batch size: 256 | lm loss: 2.296505E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.977 | TFLOPs: 31.01 | 7: iteration 58530/ 115203 | consumed samples: 14983680 | consumed tokens: 30686576640 | elapsed time per iteration (s): 0.43 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.294022E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.761 | TFLOPs: 31.36 | 7: iteration 58540/ 115203 | consumed samples: 14986240 | consumed tokens: 30691819520 | elapsed time per iteration (s): 0.43 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.284865E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.519 | TFLOPs: 31.30 | 7: iteration 58550/ 115203 | consumed samples: 14988800 | consumed tokens: 30697062400 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.289763E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.711 | TFLOPs: 31.73 | 7: iteration 58560/ 115203 | consumed samples: 14991360 | consumed tokens: 30702305280 | elapsed time per iteration (s): 0.42 | learning rate: 1.091E-04 | global batch size: 256 | lm loss: 2.286431E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.473 | TFLOPs: 31.61 | 7: iteration 58570/ 115203 | consumed samples: 14993920 | consumed tokens: 30707548160 | elapsed time per iteration (s): 0.43 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.262602E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.667 | TFLOPs: 31.20 | 7: iteration 58580/ 115203 | consumed samples: 14996480 | consumed tokens: 30712791040 | elapsed time per iteration (s): 0.43 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.274181E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.924 | TFLOPs: 31.37 | 7: iteration 58590/ 115203 | consumed samples: 14999040 | consumed tokens: 30718033920 | elapsed time per iteration (s): 0.44 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.269001E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.373 | TFLOPs: 30.87 | 7: iteration 58600/ 115203 | consumed samples: 15001600 | consumed tokens: 30723276800 | elapsed time per iteration (s): 0.42 | learning rate: 1.090E-04 | global batch size: 256 | lm loss: 2.286289E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.277 | TFLOPs: 31.76 | 7: iteration 58610/ 115203 | consumed samples: 15004160 | consumed tokens: 30728519680 | elapsed time per iteration (s): 0.43 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.276586E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.828 | TFLOPs: 31.47 | 7: iteration 58620/ 115203 | consumed samples: 15006720 | consumed tokens: 30733762560 | elapsed time per iteration (s): 0.43 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.296833E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.326 | TFLOPs: 30.97 | 7: iteration 58630/ 115203 | consumed samples: 15009280 | consumed tokens: 30739005440 | elapsed time per iteration (s): 0.43 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.290647E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.190 | TFLOPs: 31.54 | 7: iteration 58640/ 115203 | consumed samples: 15011840 | consumed tokens: 30744248320 | elapsed time per iteration (s): 0.42 | learning rate: 1.089E-04 | global batch size: 256 | lm loss: 2.282910E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.487 | TFLOPs: 31.66 | 7: iteration 58650/ 115203 | consumed samples: 15014400 | consumed tokens: 30749491200 | elapsed time per iteration (s): 0.43 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.281033E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.063 | TFLOPs: 30.96 | 7: iteration 58660/ 115203 | consumed samples: 15016960 | consumed tokens: 30754734080 | elapsed time per iteration (s): 0.43 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.287349E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.564 | TFLOPs: 31.41 | 7: iteration 58670/ 115203 | consumed samples: 15019520 | consumed tokens: 30759976960 | elapsed time per iteration (s): 0.43 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.270835E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.405 | TFLOPs: 31.55 | 7: iteration 58680/ 115203 | consumed samples: 15022080 | consumed tokens: 30765219840 | elapsed time per iteration (s): 0.44 | learning rate: 1.088E-04 | global batch size: 256 | lm loss: 2.291364E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.867 | TFLOPs: 30.63 | 7: iteration 58690/ 115203 | consumed samples: 15024640 | consumed tokens: 30770462720 | elapsed time per iteration (s): 0.43 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.261276E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.921 | TFLOPs: 31.21 | 7: iteration 58700/ 115203 | consumed samples: 15027200 | consumed tokens: 30775705600 | elapsed time per iteration (s): 0.44 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.293655E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.774 | TFLOPs: 30.26 | 7: iteration 58710/ 115203 | consumed samples: 15029760 | consumed tokens: 30780948480 | elapsed time per iteration (s): 0.43 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.285298E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.488 | TFLOPs: 31.35 | 7: iteration 58720/ 115203 | consumed samples: 15032320 | consumed tokens: 30786191360 | elapsed time per iteration (s): 0.42 | learning rate: 1.087E-04 | global batch size: 256 | lm loss: 2.282382E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.463 | TFLOPs: 32.03 | 7: iteration 58730/ 115203 | consumed samples: 15034880 | consumed tokens: 30791434240 | elapsed time per iteration (s): 0.42 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.285537E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.496 | TFLOPs: 31.77 | 7: iteration 58740/ 115203 | consumed samples: 15037440 | consumed tokens: 30796677120 | elapsed time per iteration (s): 0.44 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.321003E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.675 | TFLOPs: 30.83 | 7: iteration 58750/ 115203 | consumed samples: 15040000 | consumed tokens: 30801920000 | elapsed time per iteration (s): 0.44 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.286318E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.541 | TFLOPs: 30.83 | 7: iteration 58760/ 115203 | consumed samples: 15042560 | consumed tokens: 30807162880 | elapsed time per iteration (s): 0.43 | learning rate: 1.086E-04 | global batch size: 256 | lm loss: 2.282233E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.122 | TFLOPs: 31.54 | 7: iteration 58770/ 115203 | consumed samples: 15045120 | consumed tokens: 30812405760 | elapsed time per iteration (s): 0.42 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.284594E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.931 | TFLOPs: 32.00 | 7: iteration 58780/ 115203 | consumed samples: 15047680 | consumed tokens: 30817648640 | elapsed time per iteration (s): 0.43 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.313588E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.915 | TFLOPs: 31.53 | 7: iteration 58790/ 115203 | consumed samples: 15050240 | consumed tokens: 30822891520 | elapsed time per iteration (s): 0.43 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.318019E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.228 | TFLOPs: 31.49 | 7: iteration 58800/ 115203 | consumed samples: 15052800 | consumed tokens: 30828134400 | elapsed time per iteration (s): 0.44 | learning rate: 1.085E-04 | global batch size: 256 | lm loss: 2.286621E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.599 | TFLOPs: 30.52 | 7: iteration 58810/ 115203 | consumed samples: 15055360 | consumed tokens: 30833377280 | elapsed time per iteration (s): 0.43 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.287746E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.982 | TFLOPs: 31.01 | 7: iteration 58820/ 115203 | consumed samples: 15057920 | consumed tokens: 30838620160 | elapsed time per iteration (s): 0.44 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.298429E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.924 | TFLOPs: 30.22 | 7: iteration 58830/ 115203 | consumed samples: 15060480 | consumed tokens: 30843863040 | elapsed time per iteration (s): 0.43 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.269763E+00 | grad norm: 0.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.588 | TFLOPs: 31.30 | 7: iteration 58840/ 115203 | consumed samples: 15063040 | consumed tokens: 30849105920 | elapsed time per iteration (s): 0.43 | learning rate: 1.084E-04 | global batch size: 256 | lm loss: 2.290002E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.592 | TFLOPs: 31.41 | 7: iteration 58850/ 115203 | consumed samples: 15065600 | consumed tokens: 30854348800 | elapsed time per iteration (s): 0.44 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.278618E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.871 | TFLOPs: 30.53 | 7: iteration 58860/ 115203 | consumed samples: 15068160 | consumed tokens: 30859591680 | elapsed time per iteration (s): 0.42 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.279748E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.328 | TFLOPs: 32.02 | 7: iteration 58870/ 115203 | consumed samples: 15070720 | consumed tokens: 30864834560 | elapsed time per iteration (s): 0.43 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.296843E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.008 | TFLOPs: 31.53 | 7: iteration 58880/ 115203 | consumed samples: 15073280 | consumed tokens: 30870077440 | elapsed time per iteration (s): 0.43 | learning rate: 1.083E-04 | global batch size: 256 | lm loss: 2.233482E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.044 | TFLOPs: 31.27 | 7: iteration 58890/ 115203 | consumed samples: 15075840 | consumed tokens: 30875320320 | elapsed time per iteration (s): 0.44 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.257674E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.893 | TFLOPs: 30.79 | 7: iteration 58900/ 115203 | consumed samples: 15078400 | consumed tokens: 30880563200 | elapsed time per iteration (s): 0.44 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.321687E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.130 | TFLOPs: 30.60 | 7: iteration 58910/ 115203 | consumed samples: 15080960 | consumed tokens: 30885806080 | elapsed time per iteration (s): 0.43 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.318422E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.112 | TFLOPs: 31.49 | 7: iteration 58920/ 115203 | consumed samples: 15083520 | consumed tokens: 30891048960 | elapsed time per iteration (s): 0.44 | learning rate: 1.082E-04 | global batch size: 256 | lm loss: 2.288239E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.494 | TFLOPs: 30.67 | 7: iteration 58930/ 115203 | consumed samples: 15086080 | consumed tokens: 30896291840 | elapsed time per iteration (s): 0.43 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.271032E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.532 | TFLOPs: 30.98 | 7: iteration 58940/ 115203 | consumed samples: 15088640 | consumed tokens: 30901534720 | elapsed time per iteration (s): 0.43 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.291728E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.956 | TFLOPs: 30.95 | 7: iteration 58950/ 115203 | consumed samples: 15091200 | consumed tokens: 30906777600 | elapsed time per iteration (s): 0.42 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.278753E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.191 | TFLOPs: 31.96 | 7: iteration 58960/ 115203 | consumed samples: 15093760 | consumed tokens: 30912020480 | elapsed time per iteration (s): 0.43 | learning rate: 1.081E-04 | global batch size: 256 | lm loss: 2.285714E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.671 | TFLOPs: 31.20 | 7: iteration 58970/ 115203 | consumed samples: 15096320 | consumed tokens: 30917263360 | elapsed time per iteration (s): 0.44 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.315528E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.167 | TFLOPs: 30.76 | 7: iteration 58980/ 115203 | consumed samples: 15098880 | consumed tokens: 30922506240 | elapsed time per iteration (s): 0.44 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.283555E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.743 | TFLOPs: 30.68 | 7: iteration 58990/ 115203 | consumed samples: 15101440 | consumed tokens: 30927749120 | elapsed time per iteration (s): 0.43 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.274079E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.142 | TFLOPs: 31.17 | 7: iteration 59000/ 115203 | consumed samples: 15104000 | consumed tokens: 30932992000 | elapsed time per iteration (s): 0.43 | learning rate: 1.080E-04 | global batch size: 256 | lm loss: 2.281401E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.116 | TFLOPs: 31.33 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 59000 | lm loss value: 2.209676E+00 | lm loss PPL: 9.112759E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 59000 to checkpoints_221m 0: [2022-11-28 20:03:06,164] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step59000 is begin to save! 0: [2022-11-28 20:03:06,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:03:06,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:03:06,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:03:06,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:03:06,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:03:06,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:03:06,317] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:03:06,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:03:06,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:03:06,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:03:06,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:03:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:03:06,389] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:03:06,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:03:06,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:03:06,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:03:06,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:03:06,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:03:06,459] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:03:06,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:03:06,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:03:06,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:03:06,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:03:06,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:03:06,527] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:03:06,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:03:06,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:03:06,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:03:06,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:03:06,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:03:06,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:03:06,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:03:06,619] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:03:06,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:03:06,642] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:03:06,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:03:06,666] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:03:06,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:03:06,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:03:06,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:03:06,696] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step59000/mp_rank_00_model_states.pt 0: [2022-11-28 20:03:06,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:03:06,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:03:06,716] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:03:06,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,763] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,763] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,772] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,772] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 2: [2022-11-28 20:03:06,776] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:03:06,776] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:03:06,776] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,764] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,764] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,764] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,778] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,778] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 3: [2022-11-28 20:03:06,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 5: [2022-11-28 20:03:06,780] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:03:06,780] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:03:06,780] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 4: [2022-11-28 20:03:06,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:03:06,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:03:06,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:03:06,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 7: [2022-11-28 20:03:06,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 6: [2022-11-28 20:03:06,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:03:06,789] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:03:06,789] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,792] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,792] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,793] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,793] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 1: [2022-11-28 20:03:06,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:03:06,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:03:06,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,799] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:03:06,799] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,799] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: [2022-11-28 20:03:06,828] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step59000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:03:06,828] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step59000 is ready now! 0: successfully saved checkpoint at iteration 59000 to checkpoints_221m 7: time (ms) | save-checkpoint: 669.59 7: iteration 59010/ 115203 | consumed samples: 15106560 | consumed tokens: 30938234880 | elapsed time per iteration (s): 0.51 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.269097E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 498.793 | TFLOPs: 26.17 | 7: iteration 59020/ 115203 | consumed samples: 15109120 | consumed tokens: 30943477760 | elapsed time per iteration (s): 0.43 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.270139E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.029 | TFLOPs: 31.54 | 7: iteration 59030/ 115203 | consumed samples: 15111680 | consumed tokens: 30948720640 | elapsed time per iteration (s): 0.43 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.276530E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.017 | TFLOPs: 31.22 | 7: iteration 59040/ 115203 | consumed samples: 15114240 | consumed tokens: 30953963520 | elapsed time per iteration (s): 0.47 | learning rate: 1.079E-04 | global batch size: 256 | lm loss: 2.292649E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.071 | TFLOPs: 28.39 | 7: iteration 59050/ 115203 | consumed samples: 15116800 | consumed tokens: 30959206400 | elapsed time per iteration (s): 0.43 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.268482E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.814 | TFLOPs: 31.37 | 7: iteration 59060/ 115203 | consumed samples: 15119360 | consumed tokens: 30964449280 | elapsed time per iteration (s): 0.43 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.280899E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.238 | TFLOPs: 31.34 | 7: iteration 59070/ 115203 | consumed samples: 15121920 | consumed tokens: 30969692160 | elapsed time per iteration (s): 0.43 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.275696E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.197 | TFLOPs: 31.49 | 7: iteration 59080/ 115203 | consumed samples: 15124480 | consumed tokens: 30974935040 | elapsed time per iteration (s): 0.42 | learning rate: 1.078E-04 | global batch size: 256 | lm loss: 2.317338E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.019 | TFLOPs: 31.74 | 7: iteration 59090/ 115203 | consumed samples: 15127040 | consumed tokens: 30980177920 | elapsed time per iteration (s): 0.43 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.261385E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.722 | TFLOPs: 31.15 | 7: iteration 59100/ 115203 | consumed samples: 15129600 | consumed tokens: 30985420800 | elapsed time per iteration (s): 0.43 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.291640E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.314 | TFLOPs: 31.29 | 7: iteration 59110/ 115203 | consumed samples: 15132160 | consumed tokens: 30990663680 | elapsed time per iteration (s): 0.42 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.299969E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.678 | TFLOPs: 31.73 | 7: iteration 59120/ 115203 | consumed samples: 15134720 | consumed tokens: 30995906560 | elapsed time per iteration (s): 0.43 | learning rate: 1.077E-04 | global batch size: 256 | lm loss: 2.286850E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.603 | TFLOPs: 31.25 | 7: iteration 59130/ 115203 | consumed samples: 15137280 | consumed tokens: 31001149440 | elapsed time per iteration (s): 0.43 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.281599E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.521 | TFLOPs: 31.04 | 7: iteration 59140/ 115203 | consumed samples: 15139840 | consumed tokens: 31006392320 | elapsed time per iteration (s): 0.43 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.260329E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.573 | TFLOPs: 31.41 | 7: iteration 59150/ 115203 | consumed samples: 15142400 | consumed tokens: 31011635200 | elapsed time per iteration (s): 0.44 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.284765E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.081 | TFLOPs: 30.86 | 7: iteration 59160/ 115203 | consumed samples: 15144960 | consumed tokens: 31016878080 | elapsed time per iteration (s): 0.43 | learning rate: 1.076E-04 | global batch size: 256 | lm loss: 2.277490E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.728 | TFLOPs: 31.41 | 7: iteration 59170/ 115203 | consumed samples: 15147520 | consumed tokens: 31022120960 | elapsed time per iteration (s): 0.44 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.298020E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.061 | TFLOPs: 30.59 | 7: iteration 59180/ 115203 | consumed samples: 15150080 | consumed tokens: 31027363840 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.286516E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.826 | TFLOPs: 32.05 | 7: iteration 59190/ 115203 | consumed samples: 15152640 | consumed tokens: 31032606720 | elapsed time per iteration (s): 0.44 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.277564E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.721 | TFLOPs: 30.36 | 7: iteration 59200/ 115203 | consumed samples: 15155200 | consumed tokens: 31037849600 | elapsed time per iteration (s): 0.42 | learning rate: 1.075E-04 | global batch size: 256 | lm loss: 2.288877E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.249 | TFLOPs: 31.76 | 7: iteration 59210/ 115203 | consumed samples: 15157760 | consumed tokens: 31043092480 | elapsed time per iteration (s): 0.43 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.300277E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.352 | TFLOPs: 31.03 | 7: iteration 59220/ 115203 | consumed samples: 15160320 | consumed tokens: 31048335360 | elapsed time per iteration (s): 0.43 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.249506E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.917 | TFLOPs: 31.11 | 7: iteration 59230/ 115203 | consumed samples: 15162880 | consumed tokens: 31053578240 | elapsed time per iteration (s): 0.43 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.297590E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.579 | TFLOPs: 31.35 | 7: iteration 59240/ 115203 | consumed samples: 15165440 | consumed tokens: 31058821120 | elapsed time per iteration (s): 0.43 | learning rate: 1.074E-04 | global batch size: 256 | lm loss: 2.255934E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.836 | TFLOPs: 30.95 | 7: iteration 59250/ 115203 | consumed samples: 15168000 | consumed tokens: 31064064000 | elapsed time per iteration (s): 0.43 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.288968E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.076 | TFLOPs: 31.43 | 7: iteration 59260/ 115203 | consumed samples: 15170560 | consumed tokens: 31069306880 | elapsed time per iteration (s): 0.43 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.271285E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.691 | TFLOPs: 31.41 | 7: iteration 59270/ 115203 | consumed samples: 15173120 | consumed tokens: 31074549760 | elapsed time per iteration (s): 0.43 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.297003E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.972 | TFLOPs: 31.27 | 7: iteration 59280/ 115203 | consumed samples: 15175680 | consumed tokens: 31079792640 | elapsed time per iteration (s): 0.43 | learning rate: 1.073E-04 | global batch size: 256 | lm loss: 2.290744E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.524 | TFLOPs: 31.19 | 7: iteration 59290/ 115203 | consumed samples: 15178240 | consumed tokens: 31085035520 | elapsed time per iteration (s): 0.43 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.304020E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.183 | TFLOPs: 31.07 | 7: iteration 59300/ 115203 | consumed samples: 15180800 | consumed tokens: 31090278400 | elapsed time per iteration (s): 0.43 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.312085E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.570 | TFLOPs: 31.14 | 7: iteration 59310/ 115203 | consumed samples: 15183360 | consumed tokens: 31095521280 | elapsed time per iteration (s): 0.42 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.273534E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.421 | TFLOPs: 31.71 | 7: iteration 59320/ 115203 | consumed samples: 15185920 | consumed tokens: 31100764160 | elapsed time per iteration (s): 0.43 | learning rate: 1.072E-04 | global batch size: 256 | lm loss: 2.258253E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.533 | TFLOPs: 31.46 | 7: iteration 59330/ 115203 | consumed samples: 15188480 | consumed tokens: 31106007040 | elapsed time per iteration (s): 0.43 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.278770E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.234 | TFLOPs: 31.34 | 7: iteration 59340/ 115203 | consumed samples: 15191040 | consumed tokens: 31111249920 | elapsed time per iteration (s): 0.43 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.298024E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.635 | TFLOPs: 31.30 | 7: iteration 59350/ 115203 | consumed samples: 15193600 | consumed tokens: 31116492800 | elapsed time per iteration (s): 0.44 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.274174E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.546 | TFLOPs: 30.36 | 7: iteration 59360/ 115203 | consumed samples: 15196160 | consumed tokens: 31121735680 | elapsed time per iteration (s): 0.42 | learning rate: 1.071E-04 | global batch size: 256 | lm loss: 2.296053E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.914 | TFLOPs: 31.69 | 7: iteration 59370/ 115203 | consumed samples: 15198720 | consumed tokens: 31126978560 | elapsed time per iteration (s): 0.44 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.281478E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.892 | TFLOPs: 30.48 | 7: iteration 59380/ 115203 | consumed samples: 15201280 | consumed tokens: 31132221440 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.279497E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.160 | TFLOPs: 31.33 | 7: iteration 59390/ 115203 | consumed samples: 15203840 | consumed tokens: 31137464320 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.288719E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.795 | TFLOPs: 31.47 | 7: iteration 59400/ 115203 | consumed samples: 15206400 | consumed tokens: 31142707200 | elapsed time per iteration (s): 0.43 | learning rate: 1.070E-04 | global batch size: 256 | lm loss: 2.285015E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.161 | TFLOPs: 31.54 | 7: iteration 59410/ 115203 | consumed samples: 15208960 | consumed tokens: 31147950080 | elapsed time per iteration (s): 0.42 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.285118E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.502 | TFLOPs: 31.72 | 7: iteration 59420/ 115203 | consumed samples: 15211520 | consumed tokens: 31153192960 | elapsed time per iteration (s): 0.43 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.308351E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.047 | TFLOPs: 31.06 | 7: iteration 59430/ 115203 | consumed samples: 15214080 | consumed tokens: 31158435840 | elapsed time per iteration (s): 0.43 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.298611E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.101 | TFLOPs: 30.91 | 7: iteration 59440/ 115203 | consumed samples: 15216640 | consumed tokens: 31163678720 | elapsed time per iteration (s): 0.43 | learning rate: 1.069E-04 | global batch size: 256 | lm loss: 2.260994E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.998 | TFLOPs: 31.11 | 7: iteration 59450/ 115203 | consumed samples: 15219200 | consumed tokens: 31168921600 | elapsed time per iteration (s): 0.43 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.282967E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.847 | TFLOPs: 30.90 | 7: iteration 59460/ 115203 | consumed samples: 15221760 | consumed tokens: 31174164480 | elapsed time per iteration (s): 0.42 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.238988E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.339 | TFLOPs: 32.13 | 7: iteration 59470/ 115203 | consumed samples: 15224320 | consumed tokens: 31179407360 | elapsed time per iteration (s): 0.44 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.289181E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.893 | TFLOPs: 30.85 | 7: iteration 59480/ 115203 | consumed samples: 15226880 | consumed tokens: 31184650240 | elapsed time per iteration (s): 0.43 | learning rate: 1.068E-04 | global batch size: 256 | lm loss: 2.291637E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.463 | TFLOPs: 31.30 | 7: iteration 59490/ 115203 | consumed samples: 15229440 | consumed tokens: 31189893120 | elapsed time per iteration (s): 0.43 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.243405E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.434 | TFLOPs: 31.24 | 7: iteration 59500/ 115203 | consumed samples: 15232000 | consumed tokens: 31195136000 | elapsed time per iteration (s): 0.42 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.305378E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.063 | TFLOPs: 31.69 | 7: iteration 59510/ 115203 | consumed samples: 15234560 | consumed tokens: 31200378880 | elapsed time per iteration (s): 0.43 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.258251E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.340 | TFLOPs: 31.39 | 7: iteration 59520/ 115203 | consumed samples: 15237120 | consumed tokens: 31205621760 | elapsed time per iteration (s): 0.44 | learning rate: 1.067E-04 | global batch size: 256 | lm loss: 2.281842E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.052 | TFLOPs: 30.59 | 7: iteration 59530/ 115203 | consumed samples: 15239680 | consumed tokens: 31210864640 | elapsed time per iteration (s): 0.43 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.284774E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.403 | TFLOPs: 31.40 | 7: iteration 59540/ 115203 | consumed samples: 15242240 | consumed tokens: 31216107520 | elapsed time per iteration (s): 0.42 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.285104E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.437 | TFLOPs: 31.87 | 7: iteration 59550/ 115203 | consumed samples: 15244800 | consumed tokens: 31221350400 | elapsed time per iteration (s): 0.43 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.274240E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.132 | TFLOPs: 31.02 | 7: iteration 59560/ 115203 | consumed samples: 15247360 | consumed tokens: 31226593280 | elapsed time per iteration (s): 0.43 | learning rate: 1.066E-04 | global batch size: 256 | lm loss: 2.292440E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.785 | TFLOPs: 31.47 | 7: iteration 59570/ 115203 | consumed samples: 15249920 | consumed tokens: 31231836160 | elapsed time per iteration (s): 0.43 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.257190E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.259 | TFLOPs: 31.44 | 7: iteration 59580/ 115203 | consumed samples: 15252480 | consumed tokens: 31237079040 | elapsed time per iteration (s): 0.43 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.237155E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.805 | TFLOPs: 31.42 | 7: iteration 59590/ 115203 | consumed samples: 15255040 | consumed tokens: 31242321920 | elapsed time per iteration (s): 0.44 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.283891E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.871 | TFLOPs: 30.63 | 7: iteration 59600/ 115203 | consumed samples: 15257600 | consumed tokens: 31247564800 | elapsed time per iteration (s): 0.44 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.303608E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.087 | TFLOPs: 30.49 | 7: iteration 59610/ 115203 | consumed samples: 15260160 | consumed tokens: 31252807680 | elapsed time per iteration (s): 0.43 | learning rate: 1.065E-04 | global batch size: 256 | lm loss: 2.281890E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.602 | TFLOPs: 31.30 | 7: iteration 59620/ 115203 | consumed samples: 15262720 | consumed tokens: 31258050560 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.264404E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.717 | TFLOPs: 31.52 | 7: iteration 59630/ 115203 | consumed samples: 15265280 | consumed tokens: 31263293440 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.275507E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.519 | TFLOPs: 31.51 | 7: iteration 59640/ 115203 | consumed samples: 15267840 | consumed tokens: 31268536320 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.303549E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.801 | TFLOPs: 31.26 | 7: iteration 59650/ 115203 | consumed samples: 15270400 | consumed tokens: 31273779200 | elapsed time per iteration (s): 0.43 | learning rate: 1.064E-04 | global batch size: 256 | lm loss: 2.300795E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.742 | TFLOPs: 31.36 | 7: iteration 59660/ 115203 | consumed samples: 15272960 | consumed tokens: 31279022080 | elapsed time per iteration (s): 0.44 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.277155E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.493 | TFLOPs: 30.88 | 7: iteration 59670/ 115203 | consumed samples: 15275520 | consumed tokens: 31284264960 | elapsed time per iteration (s): 0.43 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.264780E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.981 | TFLOPs: 31.22 | 7: iteration 59680/ 115203 | consumed samples: 15278080 | consumed tokens: 31289507840 | elapsed time per iteration (s): 0.43 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.293259E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.606 | TFLOPs: 31.36 | 7: iteration 59690/ 115203 | consumed samples: 15280640 | consumed tokens: 31294750720 | elapsed time per iteration (s): 0.42 | learning rate: 1.063E-04 | global batch size: 256 | lm loss: 2.311081E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.700 | TFLOPs: 31.83 | 7: iteration 59700/ 115203 | consumed samples: 15283200 | consumed tokens: 31299993600 | elapsed time per iteration (s): 0.43 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.280480E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.942 | TFLOPs: 31.06 | 7: iteration 59710/ 115203 | consumed samples: 15285760 | consumed tokens: 31305236480 | elapsed time per iteration (s): 0.42 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.246224E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.948 | TFLOPs: 31.74 | 7: iteration 59720/ 115203 | consumed samples: 15288320 | consumed tokens: 31310479360 | elapsed time per iteration (s): 0.43 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.284436E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.623 | TFLOPs: 31.46 | 7: iteration 59730/ 115203 | consumed samples: 15290880 | consumed tokens: 31315722240 | elapsed time per iteration (s): 0.43 | learning rate: 1.062E-04 | global batch size: 256 | lm loss: 2.273064E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.589 | TFLOPs: 31.25 | 7: iteration 59740/ 115203 | consumed samples: 15293440 | consumed tokens: 31320965120 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.284700E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.797 | TFLOPs: 32.05 | 7: iteration 59750/ 115203 | consumed samples: 15296000 | consumed tokens: 31326208000 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.242312E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.727 | TFLOPs: 31.62 | 7: iteration 59760/ 115203 | consumed samples: 15298560 | consumed tokens: 31331450880 | elapsed time per iteration (s): 0.42 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.283118E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.315 | TFLOPs: 31.76 | 7: iteration 59770/ 115203 | consumed samples: 15301120 | consumed tokens: 31336693760 | elapsed time per iteration (s): 0.43 | learning rate: 1.061E-04 | global batch size: 256 | lm loss: 2.284359E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.912 | TFLOPs: 31.32 | 7: iteration 59780/ 115203 | consumed samples: 15303680 | consumed tokens: 31341936640 | elapsed time per iteration (s): 0.43 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.293418E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.986 | TFLOPs: 30.96 | 7: iteration 59790/ 115203 | consumed samples: 15306240 | consumed tokens: 31347179520 | elapsed time per iteration (s): 0.42 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.271094E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.558 | TFLOPs: 31.67 | 7: iteration 59800/ 115203 | consumed samples: 15308800 | consumed tokens: 31352422400 | elapsed time per iteration (s): 0.43 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.279625E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.183 | TFLOPs: 31.39 | 7: iteration 59810/ 115203 | consumed samples: 15311360 | consumed tokens: 31357665280 | elapsed time per iteration (s): 0.43 | learning rate: 1.060E-04 | global batch size: 256 | lm loss: 2.295175E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.583 | TFLOPs: 31.04 | 7: iteration 59820/ 115203 | consumed samples: 15313920 | consumed tokens: 31362908160 | elapsed time per iteration (s): 0.43 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.280515E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.575 | TFLOPs: 31.20 | 7: iteration 59830/ 115203 | consumed samples: 15316480 | consumed tokens: 31368151040 | elapsed time per iteration (s): 0.44 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.291866E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.918 | TFLOPs: 30.53 | 7: iteration 59840/ 115203 | consumed samples: 15319040 | consumed tokens: 31373393920 | elapsed time per iteration (s): 0.43 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.275504E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.546 | TFLOPs: 31.19 | 7: iteration 59850/ 115203 | consumed samples: 15321600 | consumed tokens: 31378636800 | elapsed time per iteration (s): 0.43 | learning rate: 1.059E-04 | global batch size: 256 | lm loss: 2.289284E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.754 | TFLOPs: 31.52 | 7: iteration 59860/ 115203 | consumed samples: 15324160 | consumed tokens: 31383879680 | elapsed time per iteration (s): 0.44 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.273831E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.162 | TFLOPs: 30.81 | 7: iteration 59870/ 115203 | consumed samples: 15326720 | consumed tokens: 31389122560 | elapsed time per iteration (s): 0.44 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.288806E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.266 | TFLOPs: 30.55 | 7: iteration 59880/ 115203 | consumed samples: 15329280 | consumed tokens: 31394365440 | elapsed time per iteration (s): 0.45 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.272692E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.903 | TFLOPs: 30.01 | 7: iteration 59890/ 115203 | consumed samples: 15331840 | consumed tokens: 31399608320 | elapsed time per iteration (s): 0.43 | learning rate: 1.058E-04 | global batch size: 256 | lm loss: 2.257338E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.263 | TFLOPs: 31.34 | 7: iteration 59900/ 115203 | consumed samples: 15334400 | consumed tokens: 31404851200 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.292568E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.220 | TFLOPs: 31.60 | 7: iteration 59910/ 115203 | consumed samples: 15336960 | consumed tokens: 31410094080 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.300187E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.135 | TFLOPs: 31.02 | 7: iteration 59920/ 115203 | consumed samples: 15339520 | consumed tokens: 31415336960 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.289895E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.954 | TFLOPs: 31.37 | 7: iteration 59930/ 115203 | consumed samples: 15342080 | consumed tokens: 31420579840 | elapsed time per iteration (s): 0.43 | learning rate: 1.057E-04 | global batch size: 256 | lm loss: 2.255435E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.620 | TFLOPs: 31.36 | 7: iteration 59940/ 115203 | consumed samples: 15344640 | consumed tokens: 31425822720 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.270676E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.877 | TFLOPs: 31.26 | 7: iteration 59950/ 115203 | consumed samples: 15347200 | consumed tokens: 31431065600 | elapsed time per iteration (s): 0.44 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.294521E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.127 | TFLOPs: 30.65 | 7: iteration 59960/ 115203 | consumed samples: 15349760 | consumed tokens: 31436308480 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.290596E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.700 | TFLOPs: 31.31 | 7: iteration 59970/ 115203 | consumed samples: 15352320 | consumed tokens: 31441551360 | elapsed time per iteration (s): 0.43 | learning rate: 1.056E-04 | global batch size: 256 | lm loss: 2.281322E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.239 | TFLOPs: 31.60 | 7: iteration 59980/ 115203 | consumed samples: 15354880 | consumed tokens: 31446794240 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.276005E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.907 | TFLOPs: 31.27 | 7: iteration 59990/ 115203 | consumed samples: 15357440 | consumed tokens: 31452037120 | elapsed time per iteration (s): 0.43 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.290935E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.884 | TFLOPs: 31.00 | 0: [2022-11-28 20:10:17,137] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=0, lr=[0.00010548489040793946, 0.00010548489040793946, 0.00010548489040793946], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 60000/ 115203 | consumed samples: 15360000 | consumed tokens: 31457280000 | elapsed time per iteration (s): 0.44 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.252735E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.987 | TFLOPs: 30.59 | 0: steps: 60000 loss: 2.3363 iter time (s): 0.429 samples/sec: 597.243 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 60000 | lm loss value: 2.207612E+00 | lm loss PPL: 9.093977E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 60000 to checkpoints_221m 0: [2022-11-28 20:10:17,369] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step60000 is begin to save! 0: [2022-11-28 20:10:17,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:10:17,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:10:17,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:10:17,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:10:17,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:10:17,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:10:17,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:10:17,772] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:10:17,773] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:10:17,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:10:17,803] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:10:17,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:10:17,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:10:17,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:10:17,864] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:10:17,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:10:17,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:10:17,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:10:17,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:10:17,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:10:17,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:10:17,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:10:17,989] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:10:18,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:10:18,021] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:10:18,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:10:18,052] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:10:18,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:10:18,083] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:10:18,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:10:18,114] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:10:18,145] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:10:18,146] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:10:18,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:10:18,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:10:18,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:10:18,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:10:18,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:10:18,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:10:18,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:10:18,278] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step60000/mp_rank_00_model_states.pt 0: [2022-11-28 20:10:18,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:10:18,281] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,304] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:10:18,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,358] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,358] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,358] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,360] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,360] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 7: [2022-11-28 20:10:18,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:10:18,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:10:18,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 2: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 1: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:10:18,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,366] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,366] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,371] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,371] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,374] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 5: [2022-11-28 20:10:18,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:10:18,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 20:10:18,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 6: [2022-11-28 20:10:18,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:10:18,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:10:18,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2022-11-28 20:10:18,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: [2022-11-28 20:10:18,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:10:18,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 3: [2022-11-28 20:10:18,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step60000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 4: [2022-11-28 20:10:18,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step60000 is ready now! 0: successfully saved checkpoint at iteration 60000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1231.02 7: iteration 60010/ 115203 | consumed samples: 15362560 | consumed tokens: 31462522880 | elapsed time per iteration (s): 0.57 | learning rate: 1.055E-04 | global batch size: 256 | lm loss: 2.255488E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 448.677 | TFLOPs: 23.54 | 7: iteration 60020/ 115203 | consumed samples: 15365120 | consumed tokens: 31467765760 | elapsed time per iteration (s): 0.43 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.251530E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.201 | TFLOPs: 30.91 | 7: iteration 60030/ 115203 | consumed samples: 15367680 | consumed tokens: 31473008640 | elapsed time per iteration (s): 0.43 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.295485E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.725 | TFLOPs: 31.47 | 7: iteration 60040/ 115203 | consumed samples: 15370240 | consumed tokens: 31478251520 | elapsed time per iteration (s): 0.42 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.273803E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.598 | TFLOPs: 31.72 | 7: iteration 60050/ 115203 | consumed samples: 15372800 | consumed tokens: 31483494400 | elapsed time per iteration (s): 0.50 | learning rate: 1.054E-04 | global batch size: 256 | lm loss: 2.235479E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 510.395 | TFLOPs: 26.78 | 7: iteration 60060/ 115203 | consumed samples: 15375360 | consumed tokens: 31488737280 | elapsed time per iteration (s): 0.43 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.284171E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.389 | TFLOPs: 31.13 | 7: iteration 60070/ 115203 | consumed samples: 15377920 | consumed tokens: 31493980160 | elapsed time per iteration (s): 0.44 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.287439E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.342 | TFLOPs: 30.82 | 7: iteration 60080/ 115203 | consumed samples: 15380480 | consumed tokens: 31499223040 | elapsed time per iteration (s): 0.44 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.254522E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.777 | TFLOPs: 30.58 | 7: iteration 60090/ 115203 | consumed samples: 15383040 | consumed tokens: 31504465920 | elapsed time per iteration (s): 0.44 | learning rate: 1.053E-04 | global batch size: 256 | lm loss: 2.295425E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.087 | TFLOPs: 30.49 | 7: iteration 60100/ 115203 | consumed samples: 15385600 | consumed tokens: 31509708800 | elapsed time per iteration (s): 0.42 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.279415E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.257 | TFLOPs: 31.70 | 7: iteration 60110/ 115203 | consumed samples: 15388160 | consumed tokens: 31514951680 | elapsed time per iteration (s): 0.43 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.291847E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.016 | TFLOPs: 31.01 | 7: iteration 60120/ 115203 | consumed samples: 15390720 | consumed tokens: 31520194560 | elapsed time per iteration (s): 0.42 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.248225E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.534 | TFLOPs: 31.72 | 7: iteration 60130/ 115203 | consumed samples: 15393280 | consumed tokens: 31525437440 | elapsed time per iteration (s): 0.43 | learning rate: 1.052E-04 | global batch size: 256 | lm loss: 2.281960E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.130 | TFLOPs: 31.17 | 7: iteration 60140/ 115203 | consumed samples: 15395840 | consumed tokens: 31530680320 | elapsed time per iteration (s): 0.45 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.289440E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.936 | TFLOPs: 30.06 | 7: iteration 60150/ 115203 | consumed samples: 15398400 | consumed tokens: 31535923200 | elapsed time per iteration (s): 0.42 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.283056E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.004 | TFLOPs: 31.74 | 7: iteration 60160/ 115203 | consumed samples: 15400960 | consumed tokens: 31541166080 | elapsed time per iteration (s): 0.42 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.288317E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.012 | TFLOPs: 31.90 | 7: iteration 60170/ 115203 | consumed samples: 15403520 | consumed tokens: 31546408960 | elapsed time per iteration (s): 0.44 | learning rate: 1.051E-04 | global batch size: 256 | lm loss: 2.285485E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.952 | TFLOPs: 30.43 | 7: iteration 60180/ 115203 | consumed samples: 15406080 | consumed tokens: 31551651840 | elapsed time per iteration (s): 0.44 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.279114E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.257 | TFLOPs: 30.81 | 7: iteration 60190/ 115203 | consumed samples: 15408640 | consumed tokens: 31556894720 | elapsed time per iteration (s): 0.43 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.293091E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.682 | TFLOPs: 31.15 | 7: iteration 60200/ 115203 | consumed samples: 15411200 | consumed tokens: 31562137600 | elapsed time per iteration (s): 0.44 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.311287E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.824 | TFLOPs: 30.63 | 7: iteration 60210/ 115203 | consumed samples: 15413760 | consumed tokens: 31567380480 | elapsed time per iteration (s): 0.44 | learning rate: 1.050E-04 | global batch size: 256 | lm loss: 2.258752E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.523 | TFLOPs: 30.20 | 7: iteration 60220/ 115203 | consumed samples: 15416320 | consumed tokens: 31572623360 | elapsed time per iteration (s): 0.43 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.310997E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.498 | TFLOPs: 31.09 | 7: iteration 60230/ 115203 | consumed samples: 15418880 | consumed tokens: 31577866240 | elapsed time per iteration (s): 0.42 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.271751E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.988 | TFLOPs: 31.69 | 7: iteration 60240/ 115203 | consumed samples: 15421440 | consumed tokens: 31583109120 | elapsed time per iteration (s): 0.43 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.262352E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.187 | TFLOPs: 31.33 | 7: iteration 60250/ 115203 | consumed samples: 15424000 | consumed tokens: 31588352000 | elapsed time per iteration (s): 0.43 | learning rate: 1.049E-04 | global batch size: 256 | lm loss: 2.265604E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.030 | TFLOPs: 30.96 | 7: iteration 60260/ 115203 | consumed samples: 15426560 | consumed tokens: 31593594880 | elapsed time per iteration (s): 0.43 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.270996E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.130 | TFLOPs: 31.44 | 7: iteration 60270/ 115203 | consumed samples: 15429120 | consumed tokens: 31598837760 | elapsed time per iteration (s): 0.43 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.312099E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.064 | TFLOPs: 31.17 | 7: iteration 60280/ 115203 | consumed samples: 15431680 | consumed tokens: 31604080640 | elapsed time per iteration (s): 0.44 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.252736E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.411 | TFLOPs: 30.87 | 7: iteration 60290/ 115203 | consumed samples: 15434240 | consumed tokens: 31609323520 | elapsed time per iteration (s): 0.43 | learning rate: 1.048E-04 | global batch size: 256 | lm loss: 2.260669E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.520 | TFLOPs: 31.51 | 7: iteration 60300/ 115203 | consumed samples: 15436800 | consumed tokens: 31614566400 | elapsed time per iteration (s): 0.43 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.290834E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.216 | TFLOPs: 31.07 | 7: iteration 60310/ 115203 | consumed samples: 15439360 | consumed tokens: 31619809280 | elapsed time per iteration (s): 0.43 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.252305E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.547 | TFLOPs: 31.40 | 7: iteration 60320/ 115203 | consumed samples: 15441920 | consumed tokens: 31625052160 | elapsed time per iteration (s): 0.42 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.272832E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.428 | TFLOPs: 31.66 | 7: iteration 60330/ 115203 | consumed samples: 15444480 | consumed tokens: 31630295040 | elapsed time per iteration (s): 0.43 | learning rate: 1.047E-04 | global batch size: 256 | lm loss: 2.300165E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.760 | TFLOPs: 31.31 | 7: iteration 60340/ 115203 | consumed samples: 15447040 | consumed tokens: 31635537920 | elapsed time per iteration (s): 0.43 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.275151E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.986 | TFLOPs: 30.96 | 7: iteration 60350/ 115203 | consumed samples: 15449600 | consumed tokens: 31640780800 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.278313E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.530 | TFLOPs: 31.88 | 7: iteration 60360/ 115203 | consumed samples: 15452160 | consumed tokens: 31646023680 | elapsed time per iteration (s): 0.44 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.273772E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.544 | TFLOPs: 30.83 | 7: iteration 60370/ 115203 | consumed samples: 15454720 | consumed tokens: 31651266560 | elapsed time per iteration (s): 0.42 | learning rate: 1.046E-04 | global batch size: 256 | lm loss: 2.283879E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.923 | TFLOPs: 31.90 | 7: iteration 60380/ 115203 | consumed samples: 15457280 | consumed tokens: 31656509440 | elapsed time per iteration (s): 0.43 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.256720E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.571 | TFLOPs: 31.20 | 7: iteration 60390/ 115203 | consumed samples: 15459840 | consumed tokens: 31661752320 | elapsed time per iteration (s): 0.43 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.319719E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.320 | TFLOPs: 31.50 | 7: iteration 60400/ 115203 | consumed samples: 15462400 | consumed tokens: 31666995200 | elapsed time per iteration (s): 0.42 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.278900E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.287 | TFLOPs: 31.76 | 7: iteration 60410/ 115203 | consumed samples: 15464960 | consumed tokens: 31672238080 | elapsed time per iteration (s): 0.43 | learning rate: 1.045E-04 | global batch size: 256 | lm loss: 2.291178E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.121 | TFLOPs: 31.43 | 7: iteration 60420/ 115203 | consumed samples: 15467520 | consumed tokens: 31677480960 | elapsed time per iteration (s): 0.45 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.270436E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.323 | TFLOPs: 29.87 | 7: iteration 60430/ 115203 | consumed samples: 15470080 | consumed tokens: 31682723840 | elapsed time per iteration (s): 0.44 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.253400E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.973 | TFLOPs: 30.27 | 7: iteration 60440/ 115203 | consumed samples: 15472640 | consumed tokens: 31687966720 | elapsed time per iteration (s): 0.44 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.277369E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.008 | TFLOPs: 30.85 | 7: iteration 60450/ 115203 | consumed samples: 15475200 | consumed tokens: 31693209600 | elapsed time per iteration (s): 0.42 | learning rate: 1.044E-04 | global batch size: 256 | lm loss: 2.279296E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.459 | TFLOPs: 32.08 | 7: iteration 60460/ 115203 | consumed samples: 15477760 | consumed tokens: 31698452480 | elapsed time per iteration (s): 0.42 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.300126E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.419 | TFLOPs: 31.66 | 7: iteration 60470/ 115203 | consumed samples: 15480320 | consumed tokens: 31703695360 | elapsed time per iteration (s): 0.43 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.296165E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.691 | TFLOPs: 31.36 | 7: iteration 60480/ 115203 | consumed samples: 15482880 | consumed tokens: 31708938240 | elapsed time per iteration (s): 0.44 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.286207E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.286 | TFLOPs: 30.66 | 7: iteration 60490/ 115203 | consumed samples: 15485440 | consumed tokens: 31714181120 | elapsed time per iteration (s): 0.43 | learning rate: 1.043E-04 | global batch size: 256 | lm loss: 2.280298E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.092 | TFLOPs: 31.49 | 7: iteration 60500/ 115203 | consumed samples: 15488000 | consumed tokens: 31719424000 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.276361E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.078 | TFLOPs: 30.91 | 7: iteration 60510/ 115203 | consumed samples: 15490560 | consumed tokens: 31724666880 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.264610E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.330 | TFLOPs: 31.13 | 7: iteration 60520/ 115203 | consumed samples: 15493120 | consumed tokens: 31729909760 | elapsed time per iteration (s): 0.43 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.233590E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.660 | TFLOPs: 31.20 | 7: iteration 60530/ 115203 | consumed samples: 15495680 | consumed tokens: 31735152640 | elapsed time per iteration (s): 0.42 | learning rate: 1.042E-04 | global batch size: 256 | lm loss: 2.259822E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.586 | TFLOPs: 31.72 | 7: iteration 60540/ 115203 | consumed samples: 15498240 | consumed tokens: 31740395520 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.269951E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.974 | TFLOPs: 31.16 | 7: iteration 60550/ 115203 | consumed samples: 15500800 | consumed tokens: 31745638400 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.315586E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.801 | TFLOPs: 31.58 | 7: iteration 60560/ 115203 | consumed samples: 15503360 | consumed tokens: 31750881280 | elapsed time per iteration (s): 0.44 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.261960E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.118 | TFLOPs: 30.33 | 7: iteration 60570/ 115203 | consumed samples: 15505920 | consumed tokens: 31756124160 | elapsed time per iteration (s): 0.43 | learning rate: 1.041E-04 | global batch size: 256 | lm loss: 2.292248E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.191 | TFLOPs: 31.28 | 7: iteration 60580/ 115203 | consumed samples: 15508480 | consumed tokens: 31761367040 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.269897E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.777 | TFLOPs: 31.57 | 7: iteration 60590/ 115203 | consumed samples: 15511040 | consumed tokens: 31766609920 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.269593E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.593 | TFLOPs: 31.41 | 7: iteration 60600/ 115203 | consumed samples: 15513600 | consumed tokens: 31771852800 | elapsed time per iteration (s): 0.42 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.270736E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.916 | TFLOPs: 31.74 | 7: iteration 60610/ 115203 | consumed samples: 15516160 | consumed tokens: 31777095680 | elapsed time per iteration (s): 0.43 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.310579E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.747 | TFLOPs: 31.26 | 7: iteration 60620/ 115203 | consumed samples: 15518720 | consumed tokens: 31782338560 | elapsed time per iteration (s): 0.42 | learning rate: 1.040E-04 | global batch size: 256 | lm loss: 2.270793E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.373 | TFLOPs: 31.66 | 7: iteration 60630/ 115203 | consumed samples: 15521280 | consumed tokens: 31787581440 | elapsed time per iteration (s): 0.42 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.289056E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.345 | TFLOPs: 31.66 | 7: iteration 60640/ 115203 | consumed samples: 15523840 | consumed tokens: 31792824320 | elapsed time per iteration (s): 0.44 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.267719E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.872 | TFLOPs: 30.84 | 7: iteration 60650/ 115203 | consumed samples: 15526400 | consumed tokens: 31798067200 | elapsed time per iteration (s): 0.42 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.277713E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.971 | TFLOPs: 31.74 | 7: iteration 60660/ 115203 | consumed samples: 15528960 | consumed tokens: 31803310080 | elapsed time per iteration (s): 0.43 | learning rate: 1.039E-04 | global batch size: 256 | lm loss: 2.264845E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.311 | TFLOPs: 31.18 | 7: iteration 60670/ 115203 | consumed samples: 15531520 | consumed tokens: 31808552960 | elapsed time per iteration (s): 0.43 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.262192E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.876 | TFLOPs: 31.26 | 7: iteration 60680/ 115203 | consumed samples: 15534080 | consumed tokens: 31813795840 | elapsed time per iteration (s): 0.43 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.279981E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.621 | TFLOPs: 31.15 | 7: iteration 60690/ 115203 | consumed samples: 15536640 | consumed tokens: 31819038720 | elapsed time per iteration (s): 0.42 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.290720E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.018 | TFLOPs: 31.80 | 7: iteration 60700/ 115203 | consumed samples: 15539200 | consumed tokens: 31824281600 | elapsed time per iteration (s): 0.45 | learning rate: 1.038E-04 | global batch size: 256 | lm loss: 2.292732E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.530 | TFLOPs: 30.09 | 7: iteration 60710/ 115203 | consumed samples: 15541760 | consumed tokens: 31829524480 | elapsed time per iteration (s): 0.42 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.279613E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.227 | TFLOPs: 31.70 | 7: iteration 60720/ 115203 | consumed samples: 15544320 | consumed tokens: 31834767360 | elapsed time per iteration (s): 0.42 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.296916E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.958 | TFLOPs: 31.95 | 7: iteration 60730/ 115203 | consumed samples: 15546880 | consumed tokens: 31840010240 | elapsed time per iteration (s): 0.43 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.283944E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.730 | TFLOPs: 31.20 | 7: iteration 60740/ 115203 | consumed samples: 15549440 | consumed tokens: 31845253120 | elapsed time per iteration (s): 0.43 | learning rate: 1.037E-04 | global batch size: 256 | lm loss: 2.281367E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.526 | TFLOPs: 31.40 | 7: iteration 60750/ 115203 | consumed samples: 15552000 | consumed tokens: 31850496000 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.265796E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.140 | TFLOPs: 31.28 | 7: iteration 60760/ 115203 | consumed samples: 15554560 | consumed tokens: 31855738880 | elapsed time per iteration (s): 0.44 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.305765E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.115 | TFLOPs: 30.80 | 7: iteration 60770/ 115203 | consumed samples: 15557120 | consumed tokens: 31860981760 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.288099E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.582 | TFLOPs: 31.56 | 7: iteration 60780/ 115203 | consumed samples: 15559680 | consumed tokens: 31866224640 | elapsed time per iteration (s): 0.43 | learning rate: 1.036E-04 | global batch size: 256 | lm loss: 2.265708E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.136 | TFLOPs: 31.33 | 7: iteration 60790/ 115203 | consumed samples: 15562240 | consumed tokens: 31871467520 | elapsed time per iteration (s): 0.42 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.265767E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.966 | TFLOPs: 31.69 | 7: iteration 60800/ 115203 | consumed samples: 15564800 | consumed tokens: 31876710400 | elapsed time per iteration (s): 0.42 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.260953E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.191 | TFLOPs: 32.12 | 7: iteration 60810/ 115203 | consumed samples: 15567360 | consumed tokens: 31881953280 | elapsed time per iteration (s): 0.43 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.277893E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.888 | TFLOPs: 31.37 | 7: iteration 60820/ 115203 | consumed samples: 15569920 | consumed tokens: 31887196160 | elapsed time per iteration (s): 0.43 | learning rate: 1.035E-04 | global batch size: 256 | lm loss: 2.275618E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.965 | TFLOPs: 31.16 | 7: iteration 60830/ 115203 | consumed samples: 15572480 | consumed tokens: 31892439040 | elapsed time per iteration (s): 0.42 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.270202E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.813 | TFLOPs: 31.73 | 7: iteration 60840/ 115203 | consumed samples: 15575040 | consumed tokens: 31897681920 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.262841E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.667 | TFLOPs: 31.52 | 7: iteration 60850/ 115203 | consumed samples: 15577600 | consumed tokens: 31902924800 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.281196E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.629 | TFLOPs: 31.57 | 7: iteration 60860/ 115203 | consumed samples: 15580160 | consumed tokens: 31908167680 | elapsed time per iteration (s): 0.43 | learning rate: 1.034E-04 | global batch size: 256 | lm loss: 2.246351E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.470 | TFLOPs: 31.24 | 7: iteration 60870/ 115203 | consumed samples: 15582720 | consumed tokens: 31913410560 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.300986E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.527 | TFLOPs: 31.40 | 7: iteration 60880/ 115203 | consumed samples: 15585280 | consumed tokens: 31918653440 | elapsed time per iteration (s): 0.44 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.262638E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.879 | TFLOPs: 30.85 | 7: iteration 60890/ 115203 | consumed samples: 15587840 | consumed tokens: 31923896320 | elapsed time per iteration (s): 0.42 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.301714E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.726 | TFLOPs: 32.04 | 7: iteration 60900/ 115203 | consumed samples: 15590400 | consumed tokens: 31929139200 | elapsed time per iteration (s): 0.43 | learning rate: 1.033E-04 | global batch size: 256 | lm loss: 2.295249E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.646 | TFLOPs: 31.41 | 7: iteration 60910/ 115203 | consumed samples: 15592960 | consumed tokens: 31934382080 | elapsed time per iteration (s): 0.44 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.293067E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.017 | TFLOPs: 30.64 | 7: iteration 60920/ 115203 | consumed samples: 15595520 | consumed tokens: 31939624960 | elapsed time per iteration (s): 0.43 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.286632E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.030 | TFLOPs: 31.43 | 7: iteration 60930/ 115203 | consumed samples: 15598080 | consumed tokens: 31944867840 | elapsed time per iteration (s): 0.44 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.266560E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.829 | TFLOPs: 30.53 | 7: iteration 60940/ 115203 | consumed samples: 15600640 | consumed tokens: 31950110720 | elapsed time per iteration (s): 0.42 | learning rate: 1.032E-04 | global batch size: 256 | lm loss: 2.277636E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.751 | TFLOPs: 31.94 | 7: iteration 60950/ 115203 | consumed samples: 15603200 | consumed tokens: 31955353600 | elapsed time per iteration (s): 0.42 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.259761E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.364 | TFLOPs: 31.61 | 7: iteration 60960/ 115203 | consumed samples: 15605760 | consumed tokens: 31960596480 | elapsed time per iteration (s): 0.43 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.250376E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.250 | TFLOPs: 31.44 | 7: iteration 60970/ 115203 | consumed samples: 15608320 | consumed tokens: 31965839360 | elapsed time per iteration (s): 0.42 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.278997E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.675 | TFLOPs: 32.25 | 7: iteration 60980/ 115203 | consumed samples: 15610880 | consumed tokens: 31971082240 | elapsed time per iteration (s): 0.42 | learning rate: 1.031E-04 | global batch size: 256 | lm loss: 2.295608E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.001 | TFLOPs: 31.69 | 7: iteration 60990/ 115203 | consumed samples: 15613440 | consumed tokens: 31976325120 | elapsed time per iteration (s): 0.42 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.289058E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.887 | TFLOPs: 32.31 | 7: iteration 61000/ 115203 | consumed samples: 15616000 | consumed tokens: 31981568000 | elapsed time per iteration (s): 0.43 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.251203E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.021 | TFLOPs: 31.38 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 61000 | lm loss value: 2.231267E+00 | lm loss PPL: 9.311658E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 61000 to checkpoints_221m 0: [2022-11-28 20:17:28,873] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step61000 is begin to save! 0: [2022-11-28 20:17:28,877] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:17:28,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:17:28,980] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:17:29,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:17:29,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:17:29,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:17:29,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:17:29,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:17:29,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:17:29,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:17:29,071] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:17:29,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:17:29,094] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:17:29,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:17:29,118] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:17:29,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:17:29,143] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:17:29,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:17:29,165] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:17:29,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:17:29,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:17:29,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:17:29,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:17:29,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:17:29,235] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:17:29,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:17:29,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:17:29,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:17:29,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:17:29,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:17:29,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:17:29,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:17:29,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:17:29,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:17:29,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:17:29,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:17:29,379] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:17:29,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:17:29,403] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:17:29,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:17:29,408] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step61000/mp_rank_00_model_states.pt 0: [2022-11-28 20:17:29,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:17:29,410] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:17:29,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,428] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:17:29,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:17:29,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 2: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:17:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 4: [2022-11-28 20:17:29,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:17:29,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:17:29,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 1: [2022-11-28 20:17:29,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:17:29,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:17:29,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,499] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:17:29,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 6: [2022-11-28 20:17:29,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:17:29,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 7: [2022-11-28 20:17:29,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:17:29,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 3: [2022-11-28 20:17:29,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 20:17:29,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:17:29,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 5: [2022-11-28 20:17:29,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:17:29,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:17:29,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: [2022-11-28 20:17:29,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step61000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:17:29,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step61000 is ready now! 0: successfully saved checkpoint at iteration 61000 to checkpoints_221m 7: time (ms) | save-checkpoint: 672.15 7: iteration 61010/ 115203 | consumed samples: 15618560 | consumed tokens: 31986810880 | elapsed time per iteration (s): 0.51 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.252564E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 501.651 | TFLOPs: 26.32 | 7: iteration 61020/ 115203 | consumed samples: 15621120 | consumed tokens: 31992053760 | elapsed time per iteration (s): 0.43 | learning rate: 1.030E-04 | global batch size: 256 | lm loss: 2.273939E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.503 | TFLOPs: 30.98 | 7: iteration 61030/ 115203 | consumed samples: 15623680 | consumed tokens: 31997296640 | elapsed time per iteration (s): 0.44 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.301203E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.158 | TFLOPs: 30.44 | 7: iteration 61040/ 115203 | consumed samples: 15626240 | consumed tokens: 32002539520 | elapsed time per iteration (s): 0.45 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.265041E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.417 | TFLOPs: 29.88 | 7: iteration 61050/ 115203 | consumed samples: 15628800 | consumed tokens: 32007782400 | elapsed time per iteration (s): 0.43 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.275837E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.618 | TFLOPs: 31.30 | 7: iteration 61060/ 115203 | consumed samples: 15631360 | consumed tokens: 32013025280 | elapsed time per iteration (s): 0.43 | learning rate: 1.029E-04 | global batch size: 256 | lm loss: 2.273606E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.943 | TFLOPs: 31.48 | 7: iteration 61070/ 115203 | consumed samples: 15633920 | consumed tokens: 32018268160 | elapsed time per iteration (s): 0.45 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.262352E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.502 | TFLOPs: 29.88 | 7: iteration 61080/ 115203 | consumed samples: 15636480 | consumed tokens: 32023511040 | elapsed time per iteration (s): 0.43 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.299080E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.289 | TFLOPs: 30.97 | 7: iteration 61090/ 115203 | consumed samples: 15639040 | consumed tokens: 32028753920 | elapsed time per iteration (s): 0.42 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.282386E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.099 | TFLOPs: 32.01 | 7: iteration 61100/ 115203 | consumed samples: 15641600 | consumed tokens: 32033996800 | elapsed time per iteration (s): 0.42 | learning rate: 1.028E-04 | global batch size: 256 | lm loss: 2.297670E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.565 | TFLOPs: 31.62 | 7: iteration 61110/ 115203 | consumed samples: 15644160 | consumed tokens: 32039239680 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.271008E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.424 | TFLOPs: 31.45 | 7: iteration 61120/ 115203 | consumed samples: 15646720 | consumed tokens: 32044482560 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.252214E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.978 | TFLOPs: 31.22 | 7: iteration 61130/ 115203 | consumed samples: 15649280 | consumed tokens: 32049725440 | elapsed time per iteration (s): 0.43 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.260170E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.908 | TFLOPs: 31.37 | 7: iteration 61140/ 115203 | consumed samples: 15651840 | consumed tokens: 32054968320 | elapsed time per iteration (s): 0.42 | learning rate: 1.027E-04 | global batch size: 256 | lm loss: 2.250611E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.238 | TFLOPs: 31.91 | 7: iteration 61150/ 115203 | consumed samples: 15654400 | consumed tokens: 32060211200 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.284392E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.428 | TFLOPs: 31.35 | 7: iteration 61160/ 115203 | consumed samples: 15656960 | consumed tokens: 32065454080 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.270195E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.866 | TFLOPs: 31.47 | 7: iteration 61170/ 115203 | consumed samples: 15659520 | consumed tokens: 32070696960 | elapsed time per iteration (s): 0.42 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.290216E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.286 | TFLOPs: 32.07 | 7: iteration 61180/ 115203 | consumed samples: 15662080 | consumed tokens: 32075939840 | elapsed time per iteration (s): 0.43 | learning rate: 1.026E-04 | global batch size: 256 | lm loss: 2.290517E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.893 | TFLOPs: 30.90 | 7: iteration 61190/ 115203 | consumed samples: 15664640 | consumed tokens: 32081182720 | elapsed time per iteration (s): 0.43 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.285747E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.286 | TFLOPs: 31.44 | 7: iteration 61200/ 115203 | consumed samples: 15667200 | consumed tokens: 32086425600 | elapsed time per iteration (s): 0.42 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.285684E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.802 | TFLOPs: 31.68 | 7: iteration 61210/ 115203 | consumed samples: 15669760 | consumed tokens: 32091668480 | elapsed time per iteration (s): 0.43 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.293878E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.811 | TFLOPs: 31.31 | 7: iteration 61220/ 115203 | consumed samples: 15672320 | consumed tokens: 32096911360 | elapsed time per iteration (s): 0.44 | learning rate: 1.025E-04 | global batch size: 256 | lm loss: 2.285501E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.949 | TFLOPs: 30.53 | 7: iteration 61230/ 115203 | consumed samples: 15674880 | consumed tokens: 32102154240 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.274507E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.246 | TFLOPs: 31.23 | 7: iteration 61240/ 115203 | consumed samples: 15677440 | consumed tokens: 32107397120 | elapsed time per iteration (s): 0.42 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.272733E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.398 | TFLOPs: 31.61 | 7: iteration 61250/ 115203 | consumed samples: 15680000 | consumed tokens: 32112640000 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.272318E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.184 | TFLOPs: 31.54 | 7: iteration 61260/ 115203 | consumed samples: 15682560 | consumed tokens: 32117882880 | elapsed time per iteration (s): 0.43 | learning rate: 1.024E-04 | global batch size: 256 | lm loss: 2.278049E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.944 | TFLOPs: 31.27 | 7: iteration 61270/ 115203 | consumed samples: 15685120 | consumed tokens: 32123125760 | elapsed time per iteration (s): 0.42 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.242970E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.707 | TFLOPs: 31.78 | 7: iteration 61280/ 115203 | consumed samples: 15687680 | consumed tokens: 32128368640 | elapsed time per iteration (s): 0.42 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.284553E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.303 | TFLOPs: 31.71 | 7: iteration 61290/ 115203 | consumed samples: 15690240 | consumed tokens: 32133611520 | elapsed time per iteration (s): 0.43 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.268868E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.259 | TFLOPs: 31.02 | 7: iteration 61300/ 115203 | consumed samples: 15692800 | consumed tokens: 32138854400 | elapsed time per iteration (s): 0.42 | learning rate: 1.023E-04 | global batch size: 256 | lm loss: 2.261394E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.927 | TFLOPs: 31.63 | 7: iteration 61310/ 115203 | consumed samples: 15695360 | consumed tokens: 32144097280 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.282524E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.206 | TFLOPs: 31.33 | 7: iteration 61320/ 115203 | consumed samples: 15697920 | consumed tokens: 32149340160 | elapsed time per iteration (s): 0.42 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.286448E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.141 | TFLOPs: 31.86 | 7: iteration 61330/ 115203 | consumed samples: 15700480 | consumed tokens: 32154583040 | elapsed time per iteration (s): 0.42 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.299807E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.714 | TFLOPs: 31.62 | 7: iteration 61340/ 115203 | consumed samples: 15703040 | consumed tokens: 32159825920 | elapsed time per iteration (s): 0.43 | learning rate: 1.022E-04 | global batch size: 256 | lm loss: 2.258874E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.987 | TFLOPs: 31.22 | 7: iteration 61350/ 115203 | consumed samples: 15705600 | consumed tokens: 32165068800 | elapsed time per iteration (s): 0.42 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.275951E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.199 | TFLOPs: 31.70 | 7: iteration 61360/ 115203 | consumed samples: 15708160 | consumed tokens: 32170311680 | elapsed time per iteration (s): 0.43 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.264627E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.712 | TFLOPs: 31.41 | 7: iteration 61370/ 115203 | consumed samples: 15710720 | consumed tokens: 32175554560 | elapsed time per iteration (s): 0.42 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.297575E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.637 | TFLOPs: 31.62 | 7: iteration 61380/ 115203 | consumed samples: 15713280 | consumed tokens: 32180797440 | elapsed time per iteration (s): 0.42 | learning rate: 1.021E-04 | global batch size: 256 | lm loss: 2.272783E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.077 | TFLOPs: 32.01 | 7: iteration 61390/ 115203 | consumed samples: 15715840 | consumed tokens: 32186040320 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.300021E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.879 | TFLOPs: 31.11 | 7: iteration 61400/ 115203 | consumed samples: 15718400 | consumed tokens: 32191283200 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.271517E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.713 | TFLOPs: 31.31 | 7: iteration 61410/ 115203 | consumed samples: 15720960 | consumed tokens: 32196526080 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.273206E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.075 | TFLOPs: 31.43 | 7: iteration 61420/ 115203 | consumed samples: 15723520 | consumed tokens: 32201768960 | elapsed time per iteration (s): 0.43 | learning rate: 1.020E-04 | global batch size: 256 | lm loss: 2.285709E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.730 | TFLOPs: 31.41 | 7: iteration 61430/ 115203 | consumed samples: 15726080 | consumed tokens: 32207011840 | elapsed time per iteration (s): 0.43 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.285235E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.856 | TFLOPs: 31.05 | 7: iteration 61440/ 115203 | consumed samples: 15728640 | consumed tokens: 32212254720 | elapsed time per iteration (s): 0.42 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.264303E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.811 | TFLOPs: 31.94 | 7: iteration 61450/ 115203 | consumed samples: 15731200 | consumed tokens: 32217497600 | elapsed time per iteration (s): 0.44 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.233100E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.710 | TFLOPs: 30.52 | 7: iteration 61460/ 115203 | consumed samples: 15733760 | consumed tokens: 32222740480 | elapsed time per iteration (s): 0.44 | learning rate: 1.019E-04 | global batch size: 256 | lm loss: 2.304589E+00 | grad norm: 0.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.161 | TFLOPs: 30.65 | 7: iteration 61470/ 115203 | consumed samples: 15736320 | consumed tokens: 32227983360 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.282124E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.375 | TFLOPs: 31.50 | 7: iteration 61480/ 115203 | consumed samples: 15738880 | consumed tokens: 32233226240 | elapsed time per iteration (s): 0.42 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.264182E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.163 | TFLOPs: 31.70 | 7: iteration 61490/ 115203 | consumed samples: 15741440 | consumed tokens: 32238469120 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.247954E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.605 | TFLOPs: 31.36 | 7: iteration 61500/ 115203 | consumed samples: 15744000 | consumed tokens: 32243712000 | elapsed time per iteration (s): 0.43 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.256684E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.286 | TFLOPs: 31.39 | 7: iteration 61510/ 115203 | consumed samples: 15746560 | consumed tokens: 32248954880 | elapsed time per iteration (s): 0.44 | learning rate: 1.018E-04 | global batch size: 256 | lm loss: 2.272849E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.260 | TFLOPs: 30.66 | 7: iteration 61520/ 115203 | consumed samples: 15749120 | consumed tokens: 32254197760 | elapsed time per iteration (s): 0.44 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.259380E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.562 | TFLOPs: 30.67 | 7: iteration 61530/ 115203 | consumed samples: 15751680 | consumed tokens: 32259440640 | elapsed time per iteration (s): 0.42 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.297155E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.981 | TFLOPs: 31.69 | 7: iteration 61540/ 115203 | consumed samples: 15754240 | consumed tokens: 32264683520 | elapsed time per iteration (s): 0.43 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.289635E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.403 | TFLOPs: 31.24 | 7: iteration 61550/ 115203 | consumed samples: 15756800 | consumed tokens: 32269926400 | elapsed time per iteration (s): 0.44 | learning rate: 1.017E-04 | global batch size: 256 | lm loss: 2.250227E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.850 | TFLOPs: 30.79 | 7: iteration 61560/ 115203 | consumed samples: 15759360 | consumed tokens: 32275169280 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.272094E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.321 | TFLOPs: 30.92 | 7: iteration 61570/ 115203 | consumed samples: 15761920 | consumed tokens: 32280412160 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.287457E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.528 | TFLOPs: 31.09 | 7: iteration 61580/ 115203 | consumed samples: 15764480 | consumed tokens: 32285655040 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.264966E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.605 | TFLOPs: 31.41 | 7: iteration 61590/ 115203 | consumed samples: 15767040 | consumed tokens: 32290897920 | elapsed time per iteration (s): 0.43 | learning rate: 1.016E-04 | global batch size: 256 | lm loss: 2.304292E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.209 | TFLOPs: 31.23 | 7: iteration 61600/ 115203 | consumed samples: 15769600 | consumed tokens: 32296140800 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.292923E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.020 | TFLOPs: 31.59 | 7: iteration 61610/ 115203 | consumed samples: 15772160 | consumed tokens: 32301383680 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.250670E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.534 | TFLOPs: 31.56 | 7: iteration 61620/ 115203 | consumed samples: 15774720 | consumed tokens: 32306626560 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.258512E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.269 | TFLOPs: 30.97 | 7: iteration 61630/ 115203 | consumed samples: 15777280 | consumed tokens: 32311869440 | elapsed time per iteration (s): 0.43 | learning rate: 1.015E-04 | global batch size: 256 | lm loss: 2.296650E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.761 | TFLOPs: 31.42 | 7: iteration 61640/ 115203 | consumed samples: 15779840 | consumed tokens: 32317112320 | elapsed time per iteration (s): 0.43 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.268950E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.415 | TFLOPs: 31.45 | 7: iteration 61650/ 115203 | consumed samples: 15782400 | consumed tokens: 32322355200 | elapsed time per iteration (s): 0.44 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.315893E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.181 | TFLOPs: 30.60 | 7: iteration 61660/ 115203 | consumed samples: 15784960 | consumed tokens: 32327598080 | elapsed time per iteration (s): 0.43 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.244674E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.223 | TFLOPs: 31.60 | 7: iteration 61670/ 115203 | consumed samples: 15787520 | consumed tokens: 32332840960 | elapsed time per iteration (s): 0.43 | learning rate: 1.014E-04 | global batch size: 256 | lm loss: 2.254201E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.125 | TFLOPs: 31.54 | 7: iteration 61680/ 115203 | consumed samples: 15790080 | consumed tokens: 32338083840 | elapsed time per iteration (s): 0.43 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.284282E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.108 | TFLOPs: 31.17 | 7: iteration 61690/ 115203 | consumed samples: 15792640 | consumed tokens: 32343326720 | elapsed time per iteration (s): 0.42 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.254777E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.097 | TFLOPs: 32.12 | 7: iteration 61700/ 115203 | consumed samples: 15795200 | consumed tokens: 32348569600 | elapsed time per iteration (s): 0.43 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.299112E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.404 | TFLOPs: 31.50 | 7: iteration 61710/ 115203 | consumed samples: 15797760 | consumed tokens: 32353812480 | elapsed time per iteration (s): 0.43 | learning rate: 1.013E-04 | global batch size: 256 | lm loss: 2.280248E+00 | grad norm: 0.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.188 | TFLOPs: 31.33 | 7: iteration 61720/ 115203 | consumed samples: 15800320 | consumed tokens: 32359055360 | elapsed time per iteration (s): 0.43 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.259623E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.948 | TFLOPs: 31.53 | 7: iteration 61730/ 115203 | consumed samples: 15802880 | consumed tokens: 32364298240 | elapsed time per iteration (s): 0.42 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.281754E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.972 | TFLOPs: 32.11 | 7: iteration 61740/ 115203 | consumed samples: 15805440 | consumed tokens: 32369541120 | elapsed time per iteration (s): 0.43 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.275818E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.203 | TFLOPs: 31.23 | 7: iteration 61750/ 115203 | consumed samples: 15808000 | consumed tokens: 32374784000 | elapsed time per iteration (s): 0.42 | learning rate: 1.012E-04 | global batch size: 256 | lm loss: 2.283770E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.475 | TFLOPs: 31.93 | 7: iteration 61760/ 115203 | consumed samples: 15810560 | consumed tokens: 32380026880 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.278511E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.318 | TFLOPs: 30.97 | 7: iteration 61770/ 115203 | consumed samples: 15813120 | consumed tokens: 32385269760 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.278264E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.298 | TFLOPs: 31.18 | 7: iteration 61780/ 115203 | consumed samples: 15815680 | consumed tokens: 32390512640 | elapsed time per iteration (s): 0.43 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.280719E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.221 | TFLOPs: 31.13 | 7: iteration 61790/ 115203 | consumed samples: 15818240 | consumed tokens: 32395755520 | elapsed time per iteration (s): 0.44 | learning rate: 1.011E-04 | global batch size: 256 | lm loss: 2.271568E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.613 | TFLOPs: 30.57 | 7: iteration 61800/ 115203 | consumed samples: 15820800 | consumed tokens: 32400998400 | elapsed time per iteration (s): 0.42 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.274051E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.559 | TFLOPs: 31.98 | 7: iteration 61810/ 115203 | consumed samples: 15823360 | consumed tokens: 32406241280 | elapsed time per iteration (s): 0.43 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.279453E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.613 | TFLOPs: 31.36 | 7: iteration 61820/ 115203 | consumed samples: 15825920 | consumed tokens: 32411484160 | elapsed time per iteration (s): 0.43 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.272318E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.565 | TFLOPs: 31.46 | 7: iteration 61830/ 115203 | consumed samples: 15828480 | consumed tokens: 32416727040 | elapsed time per iteration (s): 0.43 | learning rate: 1.010E-04 | global batch size: 256 | lm loss: 2.290015E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.803 | TFLOPs: 31.37 | 7: iteration 61840/ 115203 | consumed samples: 15831040 | consumed tokens: 32421969920 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.271947E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.624 | TFLOPs: 30.99 | 7: iteration 61850/ 115203 | consumed samples: 15833600 | consumed tokens: 32427212800 | elapsed time per iteration (s): 0.42 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.278465E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.736 | TFLOPs: 31.94 | 7: iteration 61860/ 115203 | consumed samples: 15836160 | consumed tokens: 32432455680 | elapsed time per iteration (s): 0.42 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.273531E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.712 | TFLOPs: 31.89 | 7: iteration 61870/ 115203 | consumed samples: 15838720 | consumed tokens: 32437698560 | elapsed time per iteration (s): 0.43 | learning rate: 1.009E-04 | global batch size: 256 | lm loss: 2.274471E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.258 | TFLOPs: 31.39 | 7: iteration 61880/ 115203 | consumed samples: 15841280 | consumed tokens: 32442941440 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.305640E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.057 | TFLOPs: 31.48 | 7: iteration 61890/ 115203 | consumed samples: 15843840 | consumed tokens: 32448184320 | elapsed time per iteration (s): 0.43 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.291422E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.353 | TFLOPs: 31.45 | 7: iteration 61900/ 115203 | consumed samples: 15846400 | consumed tokens: 32453427200 | elapsed time per iteration (s): 0.42 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.296682E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.029 | TFLOPs: 32.11 | 7: iteration 61910/ 115203 | consumed samples: 15848960 | consumed tokens: 32458670080 | elapsed time per iteration (s): 0.42 | learning rate: 1.008E-04 | global batch size: 256 | lm loss: 2.277743E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.275 | TFLOPs: 31.76 | 7: iteration 61920/ 115203 | consumed samples: 15851520 | consumed tokens: 32463912960 | elapsed time per iteration (s): 0.42 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.251270E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.869 | TFLOPs: 31.68 | 7: iteration 61930/ 115203 | consumed samples: 15854080 | consumed tokens: 32469155840 | elapsed time per iteration (s): 0.43 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.277514E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.535 | TFLOPs: 31.25 | 7: iteration 61940/ 115203 | consumed samples: 15856640 | consumed tokens: 32474398720 | elapsed time per iteration (s): 0.42 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.277640E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.268 | TFLOPs: 31.97 | 7: iteration 61950/ 115203 | consumed samples: 15859200 | consumed tokens: 32479641600 | elapsed time per iteration (s): 0.42 | learning rate: 1.007E-04 | global batch size: 256 | lm loss: 2.289226E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.854 | TFLOPs: 32.05 | 7: iteration 61960/ 115203 | consumed samples: 15861760 | consumed tokens: 32484884480 | elapsed time per iteration (s): 0.43 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.271945E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.979 | TFLOPs: 31.43 | 7: iteration 61970/ 115203 | consumed samples: 15864320 | consumed tokens: 32490127360 | elapsed time per iteration (s): 0.42 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.292491E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.200 | TFLOPs: 32.02 | 7: iteration 61980/ 115203 | consumed samples: 15866880 | consumed tokens: 32495370240 | elapsed time per iteration (s): 0.43 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.277467E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.608 | TFLOPs: 31.15 | 7: iteration 61990/ 115203 | consumed samples: 15869440 | consumed tokens: 32500613120 | elapsed time per iteration (s): 0.42 | learning rate: 1.006E-04 | global batch size: 256 | lm loss: 2.269354E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.228 | TFLOPs: 31.86 | 0: [2022-11-28 20:24:37,642] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=0, lr=[0.0001005423324048397, 0.0001005423324048397, 0.0001005423324048397], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 62000/ 115203 | consumed samples: 15872000 | consumed tokens: 32505856000 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.264452E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.784 | TFLOPs: 31.89 | 0: steps: 62000 loss: 2.3011 iter time (s): 0.427 samples/sec: 598.964 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 62000 | lm loss value: 2.251656E+00 | lm loss PPL: 9.503459E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 62000 to checkpoints_221m 0: [2022-11-28 20:24:37,818] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step62000 is begin to save! 0: [2022-11-28 20:24:37,823] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:24:37,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:24:37,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:24:37,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:24:37,958] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:24:37,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:24:37,982] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:24:38,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:24:38,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:24:38,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:24:38,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:24:38,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:24:38,058] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:24:38,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:24:38,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:24:38,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:24:38,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:24:38,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:24:38,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:24:38,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:24:38,154] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:24:38,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:24:38,179] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:24:38,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:24:38,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:24:38,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:24:38,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:24:38,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:24:38,253] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:24:38,277] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:24:38,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:24:38,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:24:38,302] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:24:38,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:24:38,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:24:38,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:24:38,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:24:38,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:24:38,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:24:38,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:24:38,386] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step62000/mp_rank_00_model_states.pt 0: [2022-11-28 20:24:38,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:24:38,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:24:38,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:24:38,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,456] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,456] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,456] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,456] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,458] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,458] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,458] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,458] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,459] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,459] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,460] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,460] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 2: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:24:38,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 4: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:24:38,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:24:38,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 1: [2022-11-28 20:24:38,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:24:38,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:24:38,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,455] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,455] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,459] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,459] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,459] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,462] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,462] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,468] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 3: [2022-11-28 20:24:38,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:24:38,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 20:24:38,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 20:24:38,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 5: [2022-11-28 20:24:38,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:24:38,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 7: [2022-11-28 20:24:38,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:24:38,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: [2022-11-28 20:24:38,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:24:38,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:24:38,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 6: [2022-11-28 20:24:38,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step62000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:24:38,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step62000 is ready now! 0: successfully saved checkpoint at iteration 62000 to checkpoints_221m 7: time (ms) | save-checkpoint: 743.99 7: iteration 62010/ 115203 | consumed samples: 15874560 | consumed tokens: 32511098880 | elapsed time per iteration (s): 0.52 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.274610E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 487.845 | TFLOPs: 25.60 | 7: iteration 62020/ 115203 | consumed samples: 15877120 | consumed tokens: 32516341760 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.252050E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.893 | TFLOPs: 31.90 | 7: iteration 62030/ 115203 | consumed samples: 15879680 | consumed tokens: 32521584640 | elapsed time per iteration (s): 0.42 | learning rate: 1.005E-04 | global batch size: 256 | lm loss: 2.261245E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.163 | TFLOPs: 31.96 | 7: iteration 62040/ 115203 | consumed samples: 15882240 | consumed tokens: 32526827520 | elapsed time per iteration (s): 0.63 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.269616E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 406.231 | TFLOPs: 21.31 | 7: iteration 62050/ 115203 | consumed samples: 15884800 | consumed tokens: 32532070400 | elapsed time per iteration (s): 0.49 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.261265E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 521.510 | TFLOPs: 27.36 | 7: iteration 62060/ 115203 | consumed samples: 15887360 | consumed tokens: 32537313280 | elapsed time per iteration (s): 0.44 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.276670E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.838 | TFLOPs: 30.53 | 7: iteration 62070/ 115203 | consumed samples: 15889920 | consumed tokens: 32542556160 | elapsed time per iteration (s): 0.43 | learning rate: 1.004E-04 | global batch size: 256 | lm loss: 2.291173E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.829 | TFLOPs: 31.47 | 7: iteration 62080/ 115203 | consumed samples: 15892480 | consumed tokens: 32547799040 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.302396E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.677 | TFLOPs: 31.52 | 7: iteration 62090/ 115203 | consumed samples: 15895040 | consumed tokens: 32553041920 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.282746E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.196 | TFLOPs: 31.33 | 7: iteration 62100/ 115203 | consumed samples: 15897600 | consumed tokens: 32558284800 | elapsed time per iteration (s): 0.43 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.241376E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.867 | TFLOPs: 31.53 | 7: iteration 62110/ 115203 | consumed samples: 15900160 | consumed tokens: 32563527680 | elapsed time per iteration (s): 0.42 | learning rate: 1.003E-04 | global batch size: 256 | lm loss: 2.261222E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.090 | TFLOPs: 31.96 | 7: iteration 62120/ 115203 | consumed samples: 15902720 | consumed tokens: 32568770560 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.282112E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.620 | TFLOPs: 31.67 | 7: iteration 62130/ 115203 | consumed samples: 15905280 | consumed tokens: 32574013440 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.290873E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.669 | TFLOPs: 31.88 | 7: iteration 62140/ 115203 | consumed samples: 15907840 | consumed tokens: 32579256320 | elapsed time per iteration (s): 0.43 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.273339E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.414 | TFLOPs: 31.56 | 7: iteration 62150/ 115203 | consumed samples: 15910400 | consumed tokens: 32584499200 | elapsed time per iteration (s): 0.42 | learning rate: 1.002E-04 | global batch size: 256 | lm loss: 2.273826E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.682 | TFLOPs: 31.73 | 7: iteration 62160/ 115203 | consumed samples: 15912960 | consumed tokens: 32589742080 | elapsed time per iteration (s): 0.42 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.260106E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.664 | TFLOPs: 31.88 | 7: iteration 62170/ 115203 | consumed samples: 15915520 | consumed tokens: 32594984960 | elapsed time per iteration (s): 0.42 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.295270E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.936 | TFLOPs: 31.79 | 7: iteration 62180/ 115203 | consumed samples: 15918080 | consumed tokens: 32600227840 | elapsed time per iteration (s): 0.63 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.295605E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 406.444 | TFLOPs: 21.33 | 7: iteration 62190/ 115203 | consumed samples: 15920640 | consumed tokens: 32605470720 | elapsed time per iteration (s): 0.42 | learning rate: 1.001E-04 | global batch size: 256 | lm loss: 2.299533E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.184 | TFLOPs: 31.81 | 7: iteration 62200/ 115203 | consumed samples: 15923200 | consumed tokens: 32610713600 | elapsed time per iteration (s): 0.42 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.259238E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.474 | TFLOPs: 31.87 | 7: iteration 62210/ 115203 | consumed samples: 15925760 | consumed tokens: 32615956480 | elapsed time per iteration (s): 0.44 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.276217E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.499 | TFLOPs: 30.72 | 7: iteration 62220/ 115203 | consumed samples: 15928320 | consumed tokens: 32621199360 | elapsed time per iteration (s): 0.43 | learning rate: 1.000E-04 | global batch size: 256 | lm loss: 2.306693E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.210 | TFLOPs: 31.23 | 7: iteration 62230/ 115203 | consumed samples: 15930880 | consumed tokens: 32626442240 | elapsed time per iteration (s): 0.42 | learning rate: 9.998E-05 | global batch size: 256 | lm loss: 2.279865E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.620 | TFLOPs: 31.72 | 7: iteration 62240/ 115203 | consumed samples: 15933440 | consumed tokens: 32631685120 | elapsed time per iteration (s): 0.41 | learning rate: 9.995E-05 | global batch size: 256 | lm loss: 2.286982E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 617.112 | TFLOPs: 32.38 | 7: iteration 62250/ 115203 | consumed samples: 15936000 | consumed tokens: 32636928000 | elapsed time per iteration (s): 0.42 | learning rate: 9.993E-05 | global batch size: 256 | lm loss: 2.258849E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.631 | TFLOPs: 32.25 | 7: iteration 62260/ 115203 | consumed samples: 15938560 | consumed tokens: 32642170880 | elapsed time per iteration (s): 0.43 | learning rate: 9.990E-05 | global batch size: 256 | lm loss: 2.279374E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.030 | TFLOPs: 31.48 | 7: iteration 62270/ 115203 | consumed samples: 15941120 | consumed tokens: 32647413760 | elapsed time per iteration (s): 0.42 | learning rate: 9.988E-05 | global batch size: 256 | lm loss: 2.254220E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.339 | TFLOPs: 31.66 | 7: iteration 62280/ 115203 | consumed samples: 15943680 | consumed tokens: 32652656640 | elapsed time per iteration (s): 0.42 | learning rate: 9.985E-05 | global batch size: 256 | lm loss: 2.291940E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.466 | TFLOPs: 31.93 | 7: iteration 62290/ 115203 | consumed samples: 15946240 | consumed tokens: 32657899520 | elapsed time per iteration (s): 0.42 | learning rate: 9.983E-05 | global batch size: 256 | lm loss: 2.276327E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.311 | TFLOPs: 31.65 | 7: iteration 62300/ 115203 | consumed samples: 15948800 | consumed tokens: 32663142400 | elapsed time per iteration (s): 0.42 | learning rate: 9.980E-05 | global batch size: 256 | lm loss: 2.279101E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.916 | TFLOPs: 32.05 | 7: iteration 62310/ 115203 | consumed samples: 15951360 | consumed tokens: 32668385280 | elapsed time per iteration (s): 0.44 | learning rate: 9.978E-05 | global batch size: 256 | lm loss: 2.310775E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.124 | TFLOPs: 30.33 | 7: iteration 62320/ 115203 | consumed samples: 15953920 | consumed tokens: 32673628160 | elapsed time per iteration (s): 0.42 | learning rate: 9.975E-05 | global batch size: 256 | lm loss: 2.267155E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.700 | TFLOPs: 31.83 | 7: iteration 62330/ 115203 | consumed samples: 15956480 | consumed tokens: 32678871040 | elapsed time per iteration (s): 0.46 | learning rate: 9.973E-05 | global batch size: 256 | lm loss: 2.284145E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.924 | TFLOPs: 29.06 | 7: iteration 62340/ 115203 | consumed samples: 15959040 | consumed tokens: 32684113920 | elapsed time per iteration (s): 0.43 | learning rate: 9.970E-05 | global batch size: 256 | lm loss: 2.263643E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.446 | TFLOPs: 31.19 | 7: iteration 62350/ 115203 | consumed samples: 15961600 | consumed tokens: 32689356800 | elapsed time per iteration (s): 0.43 | learning rate: 9.968E-05 | global batch size: 256 | lm loss: 2.280477E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.540 | TFLOPs: 31.04 | 7: iteration 62360/ 115203 | consumed samples: 15964160 | consumed tokens: 32694599680 | elapsed time per iteration (s): 0.43 | learning rate: 9.966E-05 | global batch size: 256 | lm loss: 2.295555E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.849 | TFLOPs: 31.11 | 7: iteration 62370/ 115203 | consumed samples: 15966720 | consumed tokens: 32699842560 | elapsed time per iteration (s): 0.42 | learning rate: 9.963E-05 | global batch size: 256 | lm loss: 2.288865E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.373 | TFLOPs: 31.66 | 7: iteration 62380/ 115203 | consumed samples: 15969280 | consumed tokens: 32705085440 | elapsed time per iteration (s): 0.43 | learning rate: 9.961E-05 | global batch size: 256 | lm loss: 2.280634E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.379 | TFLOPs: 31.50 | 7: iteration 62390/ 115203 | consumed samples: 15971840 | consumed tokens: 32710328320 | elapsed time per iteration (s): 0.43 | learning rate: 9.958E-05 | global batch size: 256 | lm loss: 2.265782E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.738 | TFLOPs: 31.36 | 7: iteration 62400/ 115203 | consumed samples: 15974400 | consumed tokens: 32715571200 | elapsed time per iteration (s): 0.43 | learning rate: 9.956E-05 | global batch size: 256 | lm loss: 2.268816E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.025 | TFLOPs: 31.32 | 7: iteration 62410/ 115203 | consumed samples: 15976960 | consumed tokens: 32720814080 | elapsed time per iteration (s): 0.42 | learning rate: 9.953E-05 | global batch size: 256 | lm loss: 2.269220E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.402 | TFLOPs: 31.71 | 7: iteration 62420/ 115203 | consumed samples: 15979520 | consumed tokens: 32726056960 | elapsed time per iteration (s): 0.43 | learning rate: 9.951E-05 | global batch size: 256 | lm loss: 2.293010E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.290 | TFLOPs: 31.29 | 7: iteration 62430/ 115203 | consumed samples: 15982080 | consumed tokens: 32731299840 | elapsed time per iteration (s): 0.42 | learning rate: 9.948E-05 | global batch size: 256 | lm loss: 2.280219E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.051 | TFLOPs: 31.85 | 7: iteration 62440/ 115203 | consumed samples: 15984640 | consumed tokens: 32736542720 | elapsed time per iteration (s): 0.43 | learning rate: 9.946E-05 | global batch size: 256 | lm loss: 2.279098E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.778 | TFLOPs: 31.26 | 7: iteration 62450/ 115203 | consumed samples: 15987200 | consumed tokens: 32741785600 | elapsed time per iteration (s): 0.43 | learning rate: 9.943E-05 | global batch size: 256 | lm loss: 2.272925E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.700 | TFLOPs: 31.26 | 7: iteration 62460/ 115203 | consumed samples: 15989760 | consumed tokens: 32747028480 | elapsed time per iteration (s): 0.43 | learning rate: 9.941E-05 | global batch size: 256 | lm loss: 2.239910E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.129 | TFLOPs: 31.59 | 7: iteration 62470/ 115203 | consumed samples: 15992320 | consumed tokens: 32752271360 | elapsed time per iteration (s): 0.43 | learning rate: 9.938E-05 | global batch size: 256 | lm loss: 2.263152E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.056 | TFLOPs: 31.59 | 7: iteration 62480/ 115203 | consumed samples: 15994880 | consumed tokens: 32757514240 | elapsed time per iteration (s): 0.42 | learning rate: 9.936E-05 | global batch size: 256 | lm loss: 2.257464E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.994 | TFLOPs: 31.74 | 7: iteration 62490/ 115203 | consumed samples: 15997440 | consumed tokens: 32762757120 | elapsed time per iteration (s): 0.43 | learning rate: 9.934E-05 | global batch size: 256 | lm loss: 2.297508E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.587 | TFLOPs: 31.56 | 7: iteration 62500/ 115203 | consumed samples: 16000000 | consumed tokens: 32768000000 | elapsed time per iteration (s): 0.42 | learning rate: 9.931E-05 | global batch size: 256 | lm loss: 2.280924E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.665 | TFLOPs: 31.62 | 7: iteration 62510/ 115203 | consumed samples: 16002560 | consumed tokens: 32773242880 | elapsed time per iteration (s): 0.43 | learning rate: 9.929E-05 | global batch size: 256 | lm loss: 2.271364E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.405 | TFLOPs: 31.50 | 7: iteration 62520/ 115203 | consumed samples: 16005120 | consumed tokens: 32778485760 | elapsed time per iteration (s): 0.44 | learning rate: 9.926E-05 | global batch size: 256 | lm loss: 2.255735E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.269 | TFLOPs: 30.24 | 7: iteration 62530/ 115203 | consumed samples: 16007680 | consumed tokens: 32783728640 | elapsed time per iteration (s): 0.44 | learning rate: 9.924E-05 | global batch size: 256 | lm loss: 2.290892E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.842 | TFLOPs: 30.58 | 7: iteration 62540/ 115203 | consumed samples: 16010240 | consumed tokens: 32788971520 | elapsed time per iteration (s): 0.43 | learning rate: 9.921E-05 | global batch size: 256 | lm loss: 2.306884E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.551 | TFLOPs: 31.35 | 7: iteration 62550/ 115203 | consumed samples: 16012800 | consumed tokens: 32794214400 | elapsed time per iteration (s): 0.43 | learning rate: 9.919E-05 | global batch size: 256 | lm loss: 2.254711E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.217 | TFLOPs: 31.23 | 7: iteration 62560/ 115203 | consumed samples: 16015360 | consumed tokens: 32799457280 | elapsed time per iteration (s): 0.43 | learning rate: 9.916E-05 | global batch size: 256 | lm loss: 2.322405E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.351 | TFLOPs: 31.50 | 7: iteration 62570/ 115203 | consumed samples: 16017920 | consumed tokens: 32804700160 | elapsed time per iteration (s): 0.42 | learning rate: 9.914E-05 | global batch size: 256 | lm loss: 2.295764E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.878 | TFLOPs: 31.63 | 7: iteration 62580/ 115203 | consumed samples: 16020480 | consumed tokens: 32809943040 | elapsed time per iteration (s): 0.43 | learning rate: 9.911E-05 | global batch size: 256 | lm loss: 2.253129E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.097 | TFLOPs: 31.01 | 7: iteration 62590/ 115203 | consumed samples: 16023040 | consumed tokens: 32815185920 | elapsed time per iteration (s): 0.43 | learning rate: 9.909E-05 | global batch size: 256 | lm loss: 2.303164E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.294 | TFLOPs: 31.60 | 7: iteration 62600/ 115203 | consumed samples: 16025600 | consumed tokens: 32820428800 | elapsed time per iteration (s): 0.84 | learning rate: 9.906E-05 | global batch size: 256 | lm loss: 2.305283E+00 | grad norm: 0.205 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 306.052 | TFLOPs: 16.06 | 7: iteration 62610/ 115203 | consumed samples: 16028160 | consumed tokens: 32825671680 | elapsed time per iteration (s): 1.04 | learning rate: 9.904E-05 | global batch size: 256 | lm loss: 2.278392E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 245.894 | TFLOPs: 12.90 | 7: iteration 62620/ 115203 | consumed samples: 16030720 | consumed tokens: 32830914560 | elapsed time per iteration (s): 0.64 | learning rate: 9.902E-05 | global batch size: 256 | lm loss: 2.289482E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 400.407 | TFLOPs: 21.01 | 7: iteration 62630/ 115203 | consumed samples: 16033280 | consumed tokens: 32836157440 | elapsed time per iteration (s): 0.44 | learning rate: 9.899E-05 | global batch size: 256 | lm loss: 2.287608E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.718 | TFLOPs: 30.84 | 7: iteration 62640/ 115203 | consumed samples: 16035840 | consumed tokens: 32841400320 | elapsed time per iteration (s): 0.45 | learning rate: 9.897E-05 | global batch size: 256 | lm loss: 2.280561E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.894 | TFLOPs: 29.59 | 7: iteration 62650/ 115203 | consumed samples: 16038400 | consumed tokens: 32846643200 | elapsed time per iteration (s): 0.43 | learning rate: 9.894E-05 | global batch size: 256 | lm loss: 2.258428E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.081 | TFLOPs: 31.01 | 7: iteration 62660/ 115203 | consumed samples: 16040960 | consumed tokens: 32851886080 | elapsed time per iteration (s): 0.44 | learning rate: 9.892E-05 | global batch size: 256 | lm loss: 2.251834E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.787 | TFLOPs: 30.32 | 7: iteration 62670/ 115203 | consumed samples: 16043520 | consumed tokens: 32857128960 | elapsed time per iteration (s): 0.43 | learning rate: 9.889E-05 | global batch size: 256 | lm loss: 2.235455E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.348 | TFLOPs: 31.24 | 7: iteration 62680/ 115203 | consumed samples: 16046080 | consumed tokens: 32862371840 | elapsed time per iteration (s): 0.43 | learning rate: 9.887E-05 | global batch size: 256 | lm loss: 2.306307E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.090 | TFLOPs: 31.38 | 7: iteration 62690/ 115203 | consumed samples: 16048640 | consumed tokens: 32867614720 | elapsed time per iteration (s): 0.44 | learning rate: 9.884E-05 | global batch size: 256 | lm loss: 2.296027E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.997 | TFLOPs: 30.22 | 7: iteration 62700/ 115203 | consumed samples: 16051200 | consumed tokens: 32872857600 | elapsed time per iteration (s): 0.43 | learning rate: 9.882E-05 | global batch size: 256 | lm loss: 2.279416E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.704 | TFLOPs: 30.89 | 7: iteration 62710/ 115203 | consumed samples: 16053760 | consumed tokens: 32878100480 | elapsed time per iteration (s): 0.43 | learning rate: 9.879E-05 | global batch size: 256 | lm loss: 2.266884E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.331 | TFLOPs: 31.55 | 7: iteration 62720/ 115203 | consumed samples: 16056320 | consumed tokens: 32883343360 | elapsed time per iteration (s): 0.44 | learning rate: 9.877E-05 | global batch size: 256 | lm loss: 2.272581E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.040 | TFLOPs: 30.70 | 7: iteration 62730/ 115203 | consumed samples: 16058880 | consumed tokens: 32888586240 | elapsed time per iteration (s): 0.43 | learning rate: 9.874E-05 | global batch size: 256 | lm loss: 2.281795E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.017 | TFLOPs: 30.90 | 7: iteration 62740/ 115203 | consumed samples: 16061440 | consumed tokens: 32893829120 | elapsed time per iteration (s): 0.44 | learning rate: 9.872E-05 | global batch size: 256 | lm loss: 2.242994E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.634 | TFLOPs: 30.73 | 7: iteration 62750/ 115203 | consumed samples: 16064000 | consumed tokens: 32899072000 | elapsed time per iteration (s): 0.43 | learning rate: 9.870E-05 | global batch size: 256 | lm loss: 2.285975E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.637 | TFLOPs: 31.51 | 7: iteration 62760/ 115203 | consumed samples: 16066560 | consumed tokens: 32904314880 | elapsed time per iteration (s): 0.42 | learning rate: 9.867E-05 | global batch size: 256 | lm loss: 2.257421E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.929 | TFLOPs: 31.74 | 7: iteration 62770/ 115203 | consumed samples: 16069120 | consumed tokens: 32909557760 | elapsed time per iteration (s): 0.44 | learning rate: 9.865E-05 | global batch size: 256 | lm loss: 2.266948E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.705 | TFLOPs: 30.52 | 7: iteration 62780/ 115203 | consumed samples: 16071680 | consumed tokens: 32914800640 | elapsed time per iteration (s): 0.46 | learning rate: 9.862E-05 | global batch size: 256 | lm loss: 2.261267E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.413 | TFLOPs: 29.35 | 7: iteration 62790/ 115203 | consumed samples: 16074240 | consumed tokens: 32920043520 | elapsed time per iteration (s): 0.43 | learning rate: 9.860E-05 | global batch size: 256 | lm loss: 2.260938E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.194 | TFLOPs: 31.28 | 7: iteration 62800/ 115203 | consumed samples: 16076800 | consumed tokens: 32925286400 | elapsed time per iteration (s): 0.44 | learning rate: 9.857E-05 | global batch size: 256 | lm loss: 2.284050E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.953 | TFLOPs: 30.32 | 7: iteration 62810/ 115203 | consumed samples: 16079360 | consumed tokens: 32930529280 | elapsed time per iteration (s): 0.46 | learning rate: 9.855E-05 | global batch size: 256 | lm loss: 2.301005E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.336 | TFLOPs: 29.14 | 7: iteration 62820/ 115203 | consumed samples: 16081920 | consumed tokens: 32935772160 | elapsed time per iteration (s): 0.43 | learning rate: 9.852E-05 | global batch size: 256 | lm loss: 2.315475E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.448 | TFLOPs: 31.24 | 7: iteration 62830/ 115203 | consumed samples: 16084480 | consumed tokens: 32941015040 | elapsed time per iteration (s): 0.43 | learning rate: 9.850E-05 | global batch size: 256 | lm loss: 2.297859E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.702 | TFLOPs: 31.41 | 7: iteration 62840/ 115203 | consumed samples: 16087040 | consumed tokens: 32946257920 | elapsed time per iteration (s): 0.43 | learning rate: 9.847E-05 | global batch size: 256 | lm loss: 2.296194E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.573 | TFLOPs: 30.93 | 7: iteration 62850/ 115203 | consumed samples: 16089600 | consumed tokens: 32951500800 | elapsed time per iteration (s): 0.45 | learning rate: 9.845E-05 | global batch size: 256 | lm loss: 2.254911E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.154 | TFLOPs: 29.97 | 7: iteration 62860/ 115203 | consumed samples: 16092160 | consumed tokens: 32956743680 | elapsed time per iteration (s): 0.43 | learning rate: 9.842E-05 | global batch size: 256 | lm loss: 2.255919E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.921 | TFLOPs: 30.90 | 7: iteration 62870/ 115203 | consumed samples: 16094720 | consumed tokens: 32961986560 | elapsed time per iteration (s): 0.43 | learning rate: 9.840E-05 | global batch size: 256 | lm loss: 2.255286E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.056 | TFLOPs: 30.96 | 7: iteration 62880/ 115203 | consumed samples: 16097280 | consumed tokens: 32967229440 | elapsed time per iteration (s): 0.43 | learning rate: 9.838E-05 | global batch size: 256 | lm loss: 2.294643E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.340 | TFLOPs: 31.18 | 7: iteration 62890/ 115203 | consumed samples: 16099840 | consumed tokens: 32972472320 | elapsed time per iteration (s): 0.44 | learning rate: 9.835E-05 | global batch size: 256 | lm loss: 2.265719E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.706 | TFLOPs: 30.73 | 7: iteration 62900/ 115203 | consumed samples: 16102400 | consumed tokens: 32977715200 | elapsed time per iteration (s): 0.43 | learning rate: 9.833E-05 | global batch size: 256 | lm loss: 2.301116E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.180 | TFLOPs: 30.97 | 7: iteration 62910/ 115203 | consumed samples: 16104960 | consumed tokens: 32982958080 | elapsed time per iteration (s): 0.45 | learning rate: 9.830E-05 | global batch size: 256 | lm loss: 2.269393E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.296 | TFLOPs: 29.97 | 7: iteration 62920/ 115203 | consumed samples: 16107520 | consumed tokens: 32988200960 | elapsed time per iteration (s): 0.44 | learning rate: 9.828E-05 | global batch size: 256 | lm loss: 2.294258E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.419 | TFLOPs: 30.51 | 7: iteration 62930/ 115203 | consumed samples: 16110080 | consumed tokens: 32993443840 | elapsed time per iteration (s): 0.43 | learning rate: 9.825E-05 | global batch size: 256 | lm loss: 2.243965E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.201 | TFLOPs: 31.18 | 7: iteration 62940/ 115203 | consumed samples: 16112640 | consumed tokens: 32998686720 | elapsed time per iteration (s): 0.44 | learning rate: 9.823E-05 | global batch size: 256 | lm loss: 2.247435E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.218 | TFLOPs: 30.34 | 7: iteration 62950/ 115203 | consumed samples: 16115200 | consumed tokens: 33003929600 | elapsed time per iteration (s): 0.43 | learning rate: 9.820E-05 | global batch size: 256 | lm loss: 2.263562E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.332 | TFLOPs: 31.34 | 7: iteration 62960/ 115203 | consumed samples: 16117760 | consumed tokens: 33009172480 | elapsed time per iteration (s): 0.44 | learning rate: 9.818E-05 | global batch size: 256 | lm loss: 2.294374E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.065 | TFLOPs: 30.85 | 7: iteration 62970/ 115203 | consumed samples: 16120320 | consumed tokens: 33014415360 | elapsed time per iteration (s): 0.45 | learning rate: 9.815E-05 | global batch size: 256 | lm loss: 2.268832E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.731 | TFLOPs: 30.05 | 7: iteration 62980/ 115203 | consumed samples: 16122880 | consumed tokens: 33019658240 | elapsed time per iteration (s): 0.62 | learning rate: 9.813E-05 | global batch size: 256 | lm loss: 2.282004E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 409.678 | TFLOPs: 21.50 | 7: iteration 62990/ 115203 | consumed samples: 16125440 | consumed tokens: 33024901120 | elapsed time per iteration (s): 0.43 | learning rate: 9.811E-05 | global batch size: 256 | lm loss: 2.297578E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.803 | TFLOPs: 31.26 | 7: iteration 63000/ 115203 | consumed samples: 16128000 | consumed tokens: 33030144000 | elapsed time per iteration (s): 0.45 | learning rate: 9.808E-05 | global batch size: 256 | lm loss: 2.250938E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.277 | TFLOPs: 29.97 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 63000 | lm loss value: 2.099856E+00 | lm loss PPL: 8.164993E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 63000 to checkpoints_221m 0: [2022-11-28 20:32:08,701] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step63000 is begin to save! 0: [2022-11-28 20:32:08,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:32:08,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:32:08,870] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:32:08,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:32:08,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:32:08,937] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:32:08,937] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:32:08,970] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:32:08,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:32:09,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:32:09,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:32:09,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:32:09,041] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:32:09,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:32:09,075] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:32:09,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:32:09,108] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:32:09,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:32:09,142] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:32:09,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:32:09,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:32:09,208] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:32:09,209] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:32:09,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:32:09,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:32:09,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:32:09,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:32:09,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:32:09,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:32:09,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:32:09,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:32:09,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:32:09,379] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:32:09,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:32:09,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:32:09,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:32:09,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:32:09,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:32:09,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:32:09,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:32:09,483] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step63000/mp_rank_00_model_states.pt 0: [2022-11-28 20:32:09,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:32:09,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:32:09,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:32:09,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:32:09,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:32:09,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 4: [2022-11-28 20:32:09,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:32:09,573] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:32:09,573] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:32:09,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 5: [2022-11-28 20:32:09,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 6: [2022-11-28 20:32:09,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:32:09,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:32:09,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: [2022-11-28 20:32:09,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:32:09,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,646] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,646] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:32:09,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 2: [2022-11-28 20:32:09,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:32:09,647] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 3: [2022-11-28 20:32:09,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:32:09,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:32:09,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step63000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 7: [2022-11-28 20:32:09,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step63000 is ready now! 0: successfully saved checkpoint at iteration 63000 to checkpoints_221m 7: time (ms) | save-checkpoint: 993.90 7: iteration 63010/ 115203 | consumed samples: 16130560 | consumed tokens: 33035386880 | elapsed time per iteration (s): 0.55 | learning rate: 9.806E-05 | global batch size: 256 | lm loss: 2.265016E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 465.802 | TFLOPs: 24.44 | 7: iteration 63020/ 115203 | consumed samples: 16133120 | consumed tokens: 33040629760 | elapsed time per iteration (s): 0.43 | learning rate: 9.803E-05 | global batch size: 256 | lm loss: 2.230298E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.484 | TFLOPs: 31.14 | 7: iteration 63030/ 115203 | consumed samples: 16135680 | consumed tokens: 33045872640 | elapsed time per iteration (s): 0.44 | learning rate: 9.801E-05 | global batch size: 256 | lm loss: 2.330458E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.101 | TFLOPs: 30.86 | 7: iteration 63040/ 115203 | consumed samples: 16138240 | consumed tokens: 33051115520 | elapsed time per iteration (s): 0.43 | learning rate: 9.798E-05 | global batch size: 256 | lm loss: 2.278352E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.860 | TFLOPs: 31.26 | 7: iteration 63050/ 115203 | consumed samples: 16140800 | consumed tokens: 33056358400 | elapsed time per iteration (s): 0.45 | learning rate: 9.796E-05 | global batch size: 256 | lm loss: 2.274256E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.121 | TFLOPs: 29.97 | 7: iteration 63060/ 115203 | consumed samples: 16143360 | consumed tokens: 33061601280 | elapsed time per iteration (s): 0.43 | learning rate: 9.793E-05 | global batch size: 256 | lm loss: 2.317043E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.074 | TFLOPs: 31.28 | 7: iteration 63070/ 115203 | consumed samples: 16145920 | consumed tokens: 33066844160 | elapsed time per iteration (s): 0.44 | learning rate: 9.791E-05 | global batch size: 256 | lm loss: 2.297008E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.803 | TFLOPs: 30.53 | 7: iteration 63080/ 115203 | consumed samples: 16148480 | consumed tokens: 33072087040 | elapsed time per iteration (s): 0.43 | learning rate: 9.788E-05 | global batch size: 256 | lm loss: 2.257787E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.582 | TFLOPs: 31.51 | 7: iteration 63090/ 115203 | consumed samples: 16151040 | consumed tokens: 33077329920 | elapsed time per iteration (s): 0.43 | learning rate: 9.786E-05 | global batch size: 256 | lm loss: 2.264341E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.939 | TFLOPs: 31.27 | 7: iteration 63100/ 115203 | consumed samples: 16153600 | consumed tokens: 33082572800 | elapsed time per iteration (s): 0.43 | learning rate: 9.784E-05 | global batch size: 256 | lm loss: 2.293749E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.168 | TFLOPs: 31.02 | 7: iteration 63110/ 115203 | consumed samples: 16156160 | consumed tokens: 33087815680 | elapsed time per iteration (s): 0.44 | learning rate: 9.781E-05 | global batch size: 256 | lm loss: 2.252295E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.214 | TFLOPs: 30.50 | 7: iteration 63120/ 115203 | consumed samples: 16158720 | consumed tokens: 33093058560 | elapsed time per iteration (s): 0.44 | learning rate: 9.779E-05 | global batch size: 256 | lm loss: 2.254394E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.978 | TFLOPs: 30.80 | 7: iteration 63130/ 115203 | consumed samples: 16161280 | consumed tokens: 33098301440 | elapsed time per iteration (s): 0.43 | learning rate: 9.776E-05 | global batch size: 256 | lm loss: 2.267959E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.814 | TFLOPs: 31.16 | 7: iteration 63140/ 115203 | consumed samples: 16163840 | consumed tokens: 33103544320 | elapsed time per iteration (s): 0.44 | learning rate: 9.774E-05 | global batch size: 256 | lm loss: 2.285097E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.807 | TFLOPs: 30.58 | 7: iteration 63150/ 115203 | consumed samples: 16166400 | consumed tokens: 33108787200 | elapsed time per iteration (s): 0.45 | learning rate: 9.771E-05 | global batch size: 256 | lm loss: 2.272290E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.865 | TFLOPs: 29.85 | 7: iteration 63160/ 115203 | consumed samples: 16168960 | consumed tokens: 33114030080 | elapsed time per iteration (s): 0.44 | learning rate: 9.769E-05 | global batch size: 256 | lm loss: 2.272911E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.864 | TFLOPs: 30.74 | 7: iteration 63170/ 115203 | consumed samples: 16171520 | consumed tokens: 33119272960 | elapsed time per iteration (s): 0.43 | learning rate: 9.766E-05 | global batch size: 256 | lm loss: 2.238087E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.059 | TFLOPs: 31.22 | 7: iteration 63180/ 115203 | consumed samples: 16174080 | consumed tokens: 33124515840 | elapsed time per iteration (s): 0.43 | learning rate: 9.764E-05 | global batch size: 256 | lm loss: 2.291098E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.738 | TFLOPs: 30.94 | 7: iteration 63190/ 115203 | consumed samples: 16176640 | consumed tokens: 33129758720 | elapsed time per iteration (s): 0.45 | learning rate: 9.761E-05 | global batch size: 256 | lm loss: 2.279202E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.371 | TFLOPs: 29.87 | 7: iteration 63200/ 115203 | consumed samples: 16179200 | consumed tokens: 33135001600 | elapsed time per iteration (s): 0.44 | learning rate: 9.759E-05 | global batch size: 256 | lm loss: 2.276120E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.708 | TFLOPs: 30.36 | 7: iteration 63210/ 115203 | consumed samples: 16181760 | consumed tokens: 33140244480 | elapsed time per iteration (s): 0.44 | learning rate: 9.757E-05 | global batch size: 256 | lm loss: 2.305659E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.209 | TFLOPs: 30.86 | 7: iteration 63220/ 115203 | consumed samples: 16184320 | consumed tokens: 33145487360 | elapsed time per iteration (s): 0.44 | learning rate: 9.754E-05 | global batch size: 256 | lm loss: 2.241312E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.021 | TFLOPs: 30.70 | 7: iteration 63230/ 115203 | consumed samples: 16186880 | consumed tokens: 33150730240 | elapsed time per iteration (s): 0.45 | learning rate: 9.752E-05 | global batch size: 256 | lm loss: 2.280589E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.143 | TFLOPs: 30.12 | 7: iteration 63240/ 115203 | consumed samples: 16189440 | consumed tokens: 33155973120 | elapsed time per iteration (s): 0.43 | learning rate: 9.749E-05 | global batch size: 256 | lm loss: 2.270791E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.283 | TFLOPs: 31.23 | 7: iteration 63250/ 115203 | consumed samples: 16192000 | consumed tokens: 33161216000 | elapsed time per iteration (s): 0.44 | learning rate: 9.747E-05 | global batch size: 256 | lm loss: 2.232153E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.814 | TFLOPs: 30.79 | 7: iteration 63260/ 115203 | consumed samples: 16194560 | consumed tokens: 33166458880 | elapsed time per iteration (s): 0.43 | learning rate: 9.744E-05 | global batch size: 256 | lm loss: 2.258340E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.008 | TFLOPs: 31.11 | 7: iteration 63270/ 115203 | consumed samples: 16197120 | consumed tokens: 33171701760 | elapsed time per iteration (s): 0.45 | learning rate: 9.742E-05 | global batch size: 256 | lm loss: 2.252140E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.238 | TFLOPs: 29.81 | 7: iteration 63280/ 115203 | consumed samples: 16199680 | consumed tokens: 33176944640 | elapsed time per iteration (s): 0.43 | learning rate: 9.739E-05 | global batch size: 256 | lm loss: 2.272671E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.798 | TFLOPs: 31.42 | 7: iteration 63290/ 115203 | consumed samples: 16202240 | consumed tokens: 33182187520 | elapsed time per iteration (s): 0.42 | learning rate: 9.737E-05 | global batch size: 256 | lm loss: 2.261618E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.382 | TFLOPs: 31.61 | 7: iteration 63300/ 115203 | consumed samples: 16204800 | consumed tokens: 33187430400 | elapsed time per iteration (s): 0.44 | learning rate: 9.734E-05 | global batch size: 256 | lm loss: 2.264388E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.520 | TFLOPs: 30.25 | 7: iteration 63310/ 115203 | consumed samples: 16207360 | consumed tokens: 33192673280 | elapsed time per iteration (s): 0.44 | learning rate: 9.732E-05 | global batch size: 256 | lm loss: 2.284609E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.290 | TFLOPs: 30.71 | 7: iteration 63320/ 115203 | consumed samples: 16209920 | consumed tokens: 33197916160 | elapsed time per iteration (s): 0.44 | learning rate: 9.730E-05 | global batch size: 256 | lm loss: 2.272832E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.440 | TFLOPs: 30.66 | 7: iteration 63330/ 115203 | consumed samples: 16212480 | consumed tokens: 33203159040 | elapsed time per iteration (s): 0.43 | learning rate: 9.727E-05 | global batch size: 256 | lm loss: 2.276841E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.390 | TFLOPs: 31.03 | 7: iteration 63340/ 115203 | consumed samples: 16215040 | consumed tokens: 33208401920 | elapsed time per iteration (s): 0.44 | learning rate: 9.725E-05 | global batch size: 256 | lm loss: 2.254226E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.370 | TFLOPs: 30.40 | 7: iteration 63350/ 115203 | consumed samples: 16217600 | consumed tokens: 33213644800 | elapsed time per iteration (s): 0.43 | learning rate: 9.722E-05 | global batch size: 256 | lm loss: 2.284879E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.515 | TFLOPs: 31.30 | 7: iteration 63360/ 115203 | consumed samples: 16220160 | consumed tokens: 33218887680 | elapsed time per iteration (s): 0.44 | learning rate: 9.720E-05 | global batch size: 256 | lm loss: 2.280560E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.088 | TFLOPs: 30.44 | 7: iteration 63370/ 115203 | consumed samples: 16222720 | consumed tokens: 33224130560 | elapsed time per iteration (s): 0.44 | learning rate: 9.717E-05 | global batch size: 256 | lm loss: 2.284772E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.820 | TFLOPs: 30.47 | 7: iteration 63380/ 115203 | consumed samples: 16225280 | consumed tokens: 33229373440 | elapsed time per iteration (s): 0.44 | learning rate: 9.715E-05 | global batch size: 256 | lm loss: 2.252214E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.553 | TFLOPs: 30.78 | 7: iteration 63390/ 115203 | consumed samples: 16227840 | consumed tokens: 33234616320 | elapsed time per iteration (s): 0.43 | learning rate: 9.712E-05 | global batch size: 256 | lm loss: 2.274583E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.596 | TFLOPs: 31.30 | 7: iteration 63400/ 115203 | consumed samples: 16230400 | consumed tokens: 33239859200 | elapsed time per iteration (s): 0.44 | learning rate: 9.710E-05 | global batch size: 256 | lm loss: 2.283217E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.256 | TFLOPs: 30.86 | 7: iteration 63410/ 115203 | consumed samples: 16232960 | consumed tokens: 33245102080 | elapsed time per iteration (s): 0.44 | learning rate: 9.707E-05 | global batch size: 256 | lm loss: 2.267304E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.543 | TFLOPs: 30.83 | 7: iteration 63420/ 115203 | consumed samples: 16235520 | consumed tokens: 33250344960 | elapsed time per iteration (s): 0.43 | learning rate: 9.705E-05 | global batch size: 256 | lm loss: 2.281896E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.165 | TFLOPs: 31.17 | 7: iteration 63430/ 115203 | consumed samples: 16238080 | consumed tokens: 33255587840 | elapsed time per iteration (s): 0.43 | learning rate: 9.703E-05 | global batch size: 256 | lm loss: 2.284986E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.451 | TFLOPs: 31.19 | 7: iteration 63440/ 115203 | consumed samples: 16240640 | consumed tokens: 33260830720 | elapsed time per iteration (s): 0.43 | learning rate: 9.700E-05 | global batch size: 256 | lm loss: 2.286836E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.414 | TFLOPs: 31.40 | 7: iteration 63450/ 115203 | consumed samples: 16243200 | consumed tokens: 33266073600 | elapsed time per iteration (s): 0.43 | learning rate: 9.698E-05 | global batch size: 256 | lm loss: 2.253715E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.006 | TFLOPs: 31.32 | 7: iteration 63460/ 115203 | consumed samples: 16245760 | consumed tokens: 33271316480 | elapsed time per iteration (s): 0.43 | learning rate: 9.695E-05 | global batch size: 256 | lm loss: 2.263380E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.279 | TFLOPs: 31.50 | 7: iteration 63470/ 115203 | consumed samples: 16248320 | consumed tokens: 33276559360 | elapsed time per iteration (s): 0.44 | learning rate: 9.693E-05 | global batch size: 256 | lm loss: 2.287544E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.177 | TFLOPs: 30.70 | 7: iteration 63480/ 115203 | consumed samples: 16250880 | consumed tokens: 33281802240 | elapsed time per iteration (s): 0.43 | learning rate: 9.690E-05 | global batch size: 256 | lm loss: 2.264703E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.724 | TFLOPs: 31.31 | 7: iteration 63490/ 115203 | consumed samples: 16253440 | consumed tokens: 33287045120 | elapsed time per iteration (s): 0.44 | learning rate: 9.688E-05 | global batch size: 256 | lm loss: 2.289627E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.932 | TFLOPs: 30.43 | 7: iteration 63500/ 115203 | consumed samples: 16256000 | consumed tokens: 33292288000 | elapsed time per iteration (s): 0.43 | learning rate: 9.685E-05 | global batch size: 256 | lm loss: 2.286828E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.500 | TFLOPs: 31.45 | 7: iteration 63510/ 115203 | consumed samples: 16258560 | consumed tokens: 33297530880 | elapsed time per iteration (s): 0.44 | learning rate: 9.683E-05 | global batch size: 256 | lm loss: 2.282845E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.338 | TFLOPs: 30.82 | 7: iteration 63520/ 115203 | consumed samples: 16261120 | consumed tokens: 33302773760 | elapsed time per iteration (s): 0.43 | learning rate: 9.680E-05 | global batch size: 256 | lm loss: 2.282328E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.979 | TFLOPs: 31.01 | 7: iteration 63530/ 115203 | consumed samples: 16263680 | consumed tokens: 33308016640 | elapsed time per iteration (s): 0.44 | learning rate: 9.678E-05 | global batch size: 256 | lm loss: 2.271200E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.371 | TFLOPs: 30.71 | 7: iteration 63540/ 115203 | consumed samples: 16266240 | consumed tokens: 33313259520 | elapsed time per iteration (s): 0.43 | learning rate: 9.676E-05 | global batch size: 256 | lm loss: 2.256657E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.536 | TFLOPs: 31.46 | 7: iteration 63550/ 115203 | consumed samples: 16268800 | consumed tokens: 33318502400 | elapsed time per iteration (s): 0.44 | learning rate: 9.673E-05 | global batch size: 256 | lm loss: 2.292420E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.894 | TFLOPs: 30.22 | 7: iteration 63560/ 115203 | consumed samples: 16271360 | consumed tokens: 33323745280 | elapsed time per iteration (s): 0.43 | learning rate: 9.671E-05 | global batch size: 256 | lm loss: 2.302392E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.466 | TFLOPs: 30.93 | 7: iteration 63570/ 115203 | consumed samples: 16273920 | consumed tokens: 33328988160 | elapsed time per iteration (s): 0.42 | learning rate: 9.668E-05 | global batch size: 256 | lm loss: 2.262154E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.071 | TFLOPs: 32.06 | 7: iteration 63580/ 115203 | consumed samples: 16276480 | consumed tokens: 33334231040 | elapsed time per iteration (s): 0.43 | learning rate: 9.666E-05 | global batch size: 256 | lm loss: 2.293463E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.348 | TFLOPs: 30.92 | 7: iteration 63590/ 115203 | consumed samples: 16279040 | consumed tokens: 33339473920 | elapsed time per iteration (s): 0.43 | learning rate: 9.663E-05 | global batch size: 256 | lm loss: 2.284007E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.361 | TFLOPs: 31.50 | 7: iteration 63600/ 115203 | consumed samples: 16281600 | consumed tokens: 33344716800 | elapsed time per iteration (s): 0.44 | learning rate: 9.661E-05 | global batch size: 256 | lm loss: 2.260743E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.565 | TFLOPs: 30.57 | 7: iteration 63610/ 115203 | consumed samples: 16284160 | consumed tokens: 33349959680 | elapsed time per iteration (s): 0.43 | learning rate: 9.658E-05 | global batch size: 256 | lm loss: 2.283686E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.826 | TFLOPs: 31.16 | 7: iteration 63620/ 115203 | consumed samples: 16286720 | consumed tokens: 33355202560 | elapsed time per iteration (s): 0.44 | learning rate: 9.656E-05 | global batch size: 256 | lm loss: 2.319773E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.132 | TFLOPs: 30.81 | 7: iteration 63630/ 115203 | consumed samples: 16289280 | consumed tokens: 33360445440 | elapsed time per iteration (s): 0.43 | learning rate: 9.653E-05 | global batch size: 256 | lm loss: 2.292170E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.944 | TFLOPs: 31.58 | 7: iteration 63640/ 115203 | consumed samples: 16291840 | consumed tokens: 33365688320 | elapsed time per iteration (s): 0.43 | learning rate: 9.651E-05 | global batch size: 256 | lm loss: 2.264971E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.757 | TFLOPs: 31.26 | 7: iteration 63650/ 115203 | consumed samples: 16294400 | consumed tokens: 33370931200 | elapsed time per iteration (s): 0.43 | learning rate: 9.649E-05 | global batch size: 256 | lm loss: 2.269166E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.174 | TFLOPs: 31.49 | 7: iteration 63660/ 115203 | consumed samples: 16296960 | consumed tokens: 33376174080 | elapsed time per iteration (s): 0.43 | learning rate: 9.646E-05 | global batch size: 256 | lm loss: 2.261316E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.051 | TFLOPs: 30.91 | 7: iteration 63670/ 115203 | consumed samples: 16299520 | consumed tokens: 33381416960 | elapsed time per iteration (s): 0.43 | learning rate: 9.644E-05 | global batch size: 256 | lm loss: 2.296759E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.598 | TFLOPs: 30.88 | 7: iteration 63680/ 115203 | consumed samples: 16302080 | consumed tokens: 33386659840 | elapsed time per iteration (s): 0.42 | learning rate: 9.641E-05 | global batch size: 256 | lm loss: 2.230330E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.644 | TFLOPs: 31.67 | 7: iteration 63690/ 115203 | consumed samples: 16304640 | consumed tokens: 33391902720 | elapsed time per iteration (s): 0.43 | learning rate: 9.639E-05 | global batch size: 256 | lm loss: 2.267704E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.407 | TFLOPs: 31.55 | 7: iteration 63700/ 115203 | consumed samples: 16307200 | consumed tokens: 33397145600 | elapsed time per iteration (s): 0.43 | learning rate: 9.636E-05 | global batch size: 256 | lm loss: 2.260317E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.888 | TFLOPs: 31.00 | 7: iteration 63710/ 115203 | consumed samples: 16309760 | consumed tokens: 33402388480 | elapsed time per iteration (s): 0.43 | learning rate: 9.634E-05 | global batch size: 256 | lm loss: 2.290965E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.439 | TFLOPs: 31.50 | 7: iteration 63720/ 115203 | consumed samples: 16312320 | consumed tokens: 33407631360 | elapsed time per iteration (s): 0.44 | learning rate: 9.631E-05 | global batch size: 256 | lm loss: 2.266324E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.217 | TFLOPs: 30.76 | 7: iteration 63730/ 115203 | consumed samples: 16314880 | consumed tokens: 33412874240 | elapsed time per iteration (s): 0.43 | learning rate: 9.629E-05 | global batch size: 256 | lm loss: 2.267107E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.371 | TFLOPs: 31.29 | 7: iteration 63740/ 115203 | consumed samples: 16317440 | consumed tokens: 33418117120 | elapsed time per iteration (s): 0.43 | learning rate: 9.627E-05 | global batch size: 256 | lm loss: 2.269479E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.556 | TFLOPs: 30.99 | 7: iteration 63750/ 115203 | consumed samples: 16320000 | consumed tokens: 33423360000 | elapsed time per iteration (s): 0.43 | learning rate: 9.624E-05 | global batch size: 256 | lm loss: 2.305146E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.272 | TFLOPs: 31.18 | 7: iteration 63760/ 115203 | consumed samples: 16322560 | consumed tokens: 33428602880 | elapsed time per iteration (s): 0.43 | learning rate: 9.622E-05 | global batch size: 256 | lm loss: 2.262619E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.332 | TFLOPs: 31.29 | 7: iteration 63770/ 115203 | consumed samples: 16325120 | consumed tokens: 33433845760 | elapsed time per iteration (s): 0.44 | learning rate: 9.619E-05 | global batch size: 256 | lm loss: 2.259785E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.380 | TFLOPs: 30.77 | 7: iteration 63780/ 115203 | consumed samples: 16327680 | consumed tokens: 33439088640 | elapsed time per iteration (s): 0.43 | learning rate: 9.617E-05 | global batch size: 256 | lm loss: 2.285688E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.357 | TFLOPs: 31.50 | 7: iteration 63790/ 115203 | consumed samples: 16330240 | consumed tokens: 33444331520 | elapsed time per iteration (s): 0.44 | learning rate: 9.614E-05 | global batch size: 256 | lm loss: 2.288261E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.134 | TFLOPs: 30.60 | 7: iteration 63800/ 115203 | consumed samples: 16332800 | consumed tokens: 33449574400 | elapsed time per iteration (s): 0.44 | learning rate: 9.612E-05 | global batch size: 256 | lm loss: 2.294150E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.486 | TFLOPs: 30.30 | 7: iteration 63810/ 115203 | consumed samples: 16335360 | consumed tokens: 33454817280 | elapsed time per iteration (s): 0.43 | learning rate: 9.609E-05 | global batch size: 256 | lm loss: 2.276909E+00 | grad norm: 0.203 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.950 | TFLOPs: 31.22 | 7: iteration 63820/ 115203 | consumed samples: 16337920 | consumed tokens: 33460060160 | elapsed time per iteration (s): 0.43 | learning rate: 9.607E-05 | global batch size: 256 | lm loss: 2.251055E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.478 | TFLOPs: 30.98 | 7: iteration 63830/ 115203 | consumed samples: 16340480 | consumed tokens: 33465303040 | elapsed time per iteration (s): 0.43 | learning rate: 9.604E-05 | global batch size: 256 | lm loss: 2.278635E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.172 | TFLOPs: 31.18 | 7: iteration 63840/ 115203 | consumed samples: 16343040 | consumed tokens: 33470545920 | elapsed time per iteration (s): 0.44 | learning rate: 9.602E-05 | global batch size: 256 | lm loss: 2.296657E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.757 | TFLOPs: 30.31 | 7: iteration 63850/ 115203 | consumed samples: 16345600 | consumed tokens: 33475788800 | elapsed time per iteration (s): 0.43 | learning rate: 9.600E-05 | global batch size: 256 | lm loss: 2.282997E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.351 | TFLOPs: 31.39 | 7: iteration 63860/ 115203 | consumed samples: 16348160 | consumed tokens: 33481031680 | elapsed time per iteration (s): 0.43 | learning rate: 9.597E-05 | global batch size: 256 | lm loss: 2.265841E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.288 | TFLOPs: 31.29 | 7: iteration 63870/ 115203 | consumed samples: 16350720 | consumed tokens: 33486274560 | elapsed time per iteration (s): 0.42 | learning rate: 9.595E-05 | global batch size: 256 | lm loss: 2.281666E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.496 | TFLOPs: 31.72 | 7: iteration 63880/ 115203 | consumed samples: 16353280 | consumed tokens: 33491517440 | elapsed time per iteration (s): 0.43 | learning rate: 9.592E-05 | global batch size: 256 | lm loss: 2.275558E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.913 | TFLOPs: 31.16 | 7: iteration 63890/ 115203 | consumed samples: 16355840 | consumed tokens: 33496760320 | elapsed time per iteration (s): 0.44 | learning rate: 9.590E-05 | global batch size: 256 | lm loss: 2.265894E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.069 | TFLOPs: 30.65 | 7: iteration 63900/ 115203 | consumed samples: 16358400 | consumed tokens: 33502003200 | elapsed time per iteration (s): 0.43 | learning rate: 9.587E-05 | global batch size: 256 | lm loss: 2.275321E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.070 | TFLOPs: 31.27 | 7: iteration 63910/ 115203 | consumed samples: 16360960 | consumed tokens: 33507246080 | elapsed time per iteration (s): 0.43 | learning rate: 9.585E-05 | global batch size: 256 | lm loss: 2.278763E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.509 | TFLOPs: 30.88 | 7: iteration 63920/ 115203 | consumed samples: 16363520 | consumed tokens: 33512488960 | elapsed time per iteration (s): 0.43 | learning rate: 9.582E-05 | global batch size: 256 | lm loss: 2.275362E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.288 | TFLOPs: 31.02 | 7: iteration 63930/ 115203 | consumed samples: 16366080 | consumed tokens: 33517731840 | elapsed time per iteration (s): 0.43 | learning rate: 9.580E-05 | global batch size: 256 | lm loss: 2.263836E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.270 | TFLOPs: 31.08 | 7: iteration 63940/ 115203 | consumed samples: 16368640 | consumed tokens: 33522974720 | elapsed time per iteration (s): 0.44 | learning rate: 9.578E-05 | global batch size: 256 | lm loss: 2.257315E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.110 | TFLOPs: 30.75 | 7: iteration 63950/ 115203 | consumed samples: 16371200 | consumed tokens: 33528217600 | elapsed time per iteration (s): 0.44 | learning rate: 9.575E-05 | global batch size: 256 | lm loss: 2.296303E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.205 | TFLOPs: 30.76 | 7: iteration 63960/ 115203 | consumed samples: 16373760 | consumed tokens: 33533460480 | elapsed time per iteration (s): 0.45 | learning rate: 9.573E-05 | global batch size: 256 | lm loss: 2.255385E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.500 | TFLOPs: 30.14 | 7: iteration 63970/ 115203 | consumed samples: 16376320 | consumed tokens: 33538703360 | elapsed time per iteration (s): 0.43 | learning rate: 9.570E-05 | global batch size: 256 | lm loss: 2.256540E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.029 | TFLOPs: 31.27 | 7: iteration 63980/ 115203 | consumed samples: 16378880 | consumed tokens: 33543946240 | elapsed time per iteration (s): 0.43 | learning rate: 9.568E-05 | global batch size: 256 | lm loss: 2.254042E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.507 | TFLOPs: 31.14 | 7: iteration 63990/ 115203 | consumed samples: 16381440 | consumed tokens: 33549189120 | elapsed time per iteration (s): 0.44 | learning rate: 9.565E-05 | global batch size: 256 | lm loss: 2.292487E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.143 | TFLOPs: 30.75 | 0: [2022-11-28 20:39:23,805] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=0, lr=[9.56284709392273e-05, 9.56284709392273e-05, 9.56284709392273e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 64000/ 115203 | consumed samples: 16384000 | consumed tokens: 33554432000 | elapsed time per iteration (s): 0.44 | learning rate: 9.563E-05 | global batch size: 256 | lm loss: 2.276332E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.399 | TFLOPs: 30.19 | 0: steps: 64000 loss: 2.3273 iter time (s): 0.440 samples/sec: 581.384 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 64000 | lm loss value: 2.328207E+00 | lm loss PPL: 1.025952E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 64000 to checkpoints_221m 0: [2022-11-28 20:39:23,974] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step64000 is begin to save! 0: [2022-11-28 20:39:23,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:39:24,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:39:24,092] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:39:24,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:39:24,115] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:39:24,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:39:24,141] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:39:24,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:39:24,164] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:39:24,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:39:24,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:39:24,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:39:24,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:39:24,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:39:24,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:39:24,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:39:24,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:39:24,285] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:39:24,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:39:24,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:39:24,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:39:24,335] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:39:24,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:39:24,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:39:24,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:39:24,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:39:24,384] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:39:24,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:39:24,409] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:39:24,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:39:24,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:39:24,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:39:24,463] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:39:24,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:39:24,489] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:39:24,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:39:24,510] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:39:24,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:39:24,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:39:24,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:39:24,539] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step64000/mp_rank_00_model_states.pt 0: [2022-11-28 20:39:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:39:24,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,559] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:39:24,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:39:24,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,618] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,618] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 7: [2022-11-28 20:39:24,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:39:24,619] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,620] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 6: [2022-11-28 20:39:24,620] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,622] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,622] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,622] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 2: [2022-11-28 20:39:24,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:39:24,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:39:24,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,613] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,613] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,623] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,629] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,623] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,621] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,621] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,625] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:39:24,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 5: [2022-11-28 20:39:24,625] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:39:24,625] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,630] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,630] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 1: [2022-11-28 20:39:24,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:39:24,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:39:24,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 4: [2022-11-28 20:39:24,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:39:24,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:39:24,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:39:24,653] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,653] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:39:24,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 3: [2022-11-28 20:39:24,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: [2022-11-28 20:39:24,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step64000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:39:24,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step64000 is ready now! 0: successfully saved checkpoint at iteration 64000 to checkpoints_221m 7: time (ms) | save-checkpoint: 703.77 7: iteration 64010/ 115203 | consumed samples: 16386560 | consumed tokens: 33559674880 | elapsed time per iteration (s): 0.53 | learning rate: 9.560E-05 | global batch size: 256 | lm loss: 2.266651E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 481.877 | TFLOPs: 25.28 | 7: iteration 64020/ 115203 | consumed samples: 16389120 | consumed tokens: 33564917760 | elapsed time per iteration (s): 0.44 | learning rate: 9.558E-05 | global batch size: 256 | lm loss: 2.255112E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.576 | TFLOPs: 30.72 | 7: iteration 64030/ 115203 | consumed samples: 16391680 | consumed tokens: 33570160640 | elapsed time per iteration (s): 0.43 | learning rate: 9.556E-05 | global batch size: 256 | lm loss: 2.277566E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.711 | TFLOPs: 31.05 | 7: iteration 64040/ 115203 | consumed samples: 16394240 | consumed tokens: 33575403520 | elapsed time per iteration (s): 0.71 | learning rate: 9.553E-05 | global batch size: 256 | lm loss: 2.253445E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 359.436 | TFLOPs: 18.86 | 7: iteration 64050/ 115203 | consumed samples: 16396800 | consumed tokens: 33580646400 | elapsed time per iteration (s): 0.42 | learning rate: 9.551E-05 | global batch size: 256 | lm loss: 2.305841E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.522 | TFLOPs: 32.30 | 7: iteration 64060/ 115203 | consumed samples: 16399360 | consumed tokens: 33585889280 | elapsed time per iteration (s): 0.43 | learning rate: 9.548E-05 | global batch size: 256 | lm loss: 2.280306E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.819 | TFLOPs: 31.21 | 7: iteration 64070/ 115203 | consumed samples: 16401920 | consumed tokens: 33591132160 | elapsed time per iteration (s): 0.43 | learning rate: 9.546E-05 | global batch size: 256 | lm loss: 2.335108E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.548 | TFLOPs: 31.09 | 7: iteration 64080/ 115203 | consumed samples: 16404480 | consumed tokens: 33596375040 | elapsed time per iteration (s): 0.44 | learning rate: 9.543E-05 | global batch size: 256 | lm loss: 2.246712E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.015 | TFLOPs: 30.80 | 7: iteration 64090/ 115203 | consumed samples: 16407040 | consumed tokens: 33601617920 | elapsed time per iteration (s): 0.43 | learning rate: 9.541E-05 | global batch size: 256 | lm loss: 2.281384E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.920 | TFLOPs: 31.16 | 7: iteration 64100/ 115203 | consumed samples: 16409600 | consumed tokens: 33606860800 | elapsed time per iteration (s): 0.43 | learning rate: 9.538E-05 | global batch size: 256 | lm loss: 2.224304E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.109 | TFLOPs: 31.54 | 7: iteration 64110/ 115203 | consumed samples: 16412160 | consumed tokens: 33612103680 | elapsed time per iteration (s): 0.44 | learning rate: 9.536E-05 | global batch size: 256 | lm loss: 2.275516E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.966 | TFLOPs: 30.85 | 7: iteration 64120/ 115203 | consumed samples: 16414720 | consumed tokens: 33617346560 | elapsed time per iteration (s): 0.42 | learning rate: 9.533E-05 | global batch size: 256 | lm loss: 2.264571E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.683 | TFLOPs: 31.73 | 7: iteration 64130/ 115203 | consumed samples: 16417280 | consumed tokens: 33622589440 | elapsed time per iteration (s): 0.44 | learning rate: 9.531E-05 | global batch size: 256 | lm loss: 2.261823E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.543 | TFLOPs: 30.62 | 7: iteration 64140/ 115203 | consumed samples: 16419840 | consumed tokens: 33627832320 | elapsed time per iteration (s): 0.43 | learning rate: 9.529E-05 | global batch size: 256 | lm loss: 2.263392E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.524 | TFLOPs: 30.88 | 7: iteration 64150/ 115203 | consumed samples: 16422400 | consumed tokens: 33633075200 | elapsed time per iteration (s): 0.43 | learning rate: 9.526E-05 | global batch size: 256 | lm loss: 2.284034E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.119 | TFLOPs: 31.38 | 7: iteration 64160/ 115203 | consumed samples: 16424960 | consumed tokens: 33638318080 | elapsed time per iteration (s): 0.44 | learning rate: 9.524E-05 | global batch size: 256 | lm loss: 2.263523E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.588 | TFLOPs: 30.83 | 7: iteration 64170/ 115203 | consumed samples: 16427520 | consumed tokens: 33643560960 | elapsed time per iteration (s): 0.43 | learning rate: 9.521E-05 | global batch size: 256 | lm loss: 2.272593E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.675 | TFLOPs: 31.25 | 7: iteration 64180/ 115203 | consumed samples: 16430080 | consumed tokens: 33648803840 | elapsed time per iteration (s): 0.44 | learning rate: 9.519E-05 | global batch size: 256 | lm loss: 2.275439E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.210 | TFLOPs: 30.70 | 7: iteration 64190/ 115203 | consumed samples: 16432640 | consumed tokens: 33654046720 | elapsed time per iteration (s): 0.44 | learning rate: 9.516E-05 | global batch size: 256 | lm loss: 2.261178E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.769 | TFLOPs: 30.68 | 7: iteration 64200/ 115203 | consumed samples: 16435200 | consumed tokens: 33659289600 | elapsed time per iteration (s): 0.43 | learning rate: 9.514E-05 | global batch size: 256 | lm loss: 2.280408E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.892 | TFLOPs: 31.11 | 7: iteration 64210/ 115203 | consumed samples: 16437760 | consumed tokens: 33664532480 | elapsed time per iteration (s): 0.43 | learning rate: 9.511E-05 | global batch size: 256 | lm loss: 2.293902E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.629 | TFLOPs: 30.88 | 7: iteration 64220/ 115203 | consumed samples: 16440320 | consumed tokens: 33669775360 | elapsed time per iteration (s): 0.42 | learning rate: 9.509E-05 | global batch size: 256 | lm loss: 2.302870E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.639 | TFLOPs: 31.88 | 7: iteration 64230/ 115203 | consumed samples: 16442880 | consumed tokens: 33675018240 | elapsed time per iteration (s): 0.43 | learning rate: 9.507E-05 | global batch size: 256 | lm loss: 2.262478E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.398 | TFLOPs: 31.03 | 7: iteration 64240/ 115203 | consumed samples: 16445440 | consumed tokens: 33680261120 | elapsed time per iteration (s): 0.43 | learning rate: 9.504E-05 | global batch size: 256 | lm loss: 2.274764E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.532 | TFLOPs: 30.98 | 7: iteration 64250/ 115203 | consumed samples: 16448000 | consumed tokens: 33685504000 | elapsed time per iteration (s): 0.44 | learning rate: 9.502E-05 | global batch size: 256 | lm loss: 2.255315E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.313 | TFLOPs: 30.82 | 7: iteration 64260/ 115203 | consumed samples: 16450560 | consumed tokens: 33690746880 | elapsed time per iteration (s): 0.44 | learning rate: 9.499E-05 | global batch size: 256 | lm loss: 2.252456E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.054 | TFLOPs: 30.64 | 7: iteration 64270/ 115203 | consumed samples: 16453120 | consumed tokens: 33695989760 | elapsed time per iteration (s): 0.43 | learning rate: 9.497E-05 | global batch size: 256 | lm loss: 2.289758E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.285 | TFLOPs: 31.08 | 7: iteration 64280/ 115203 | consumed samples: 16455680 | consumed tokens: 33701232640 | elapsed time per iteration (s): 0.42 | learning rate: 9.494E-05 | global batch size: 256 | lm loss: 2.282148E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.494 | TFLOPs: 31.66 | 7: iteration 64290/ 115203 | consumed samples: 16458240 | consumed tokens: 33706475520 | elapsed time per iteration (s): 0.43 | learning rate: 9.492E-05 | global batch size: 256 | lm loss: 2.269187E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.996 | TFLOPs: 31.22 | 7: iteration 64300/ 115203 | consumed samples: 16460800 | consumed tokens: 33711718400 | elapsed time per iteration (s): 0.45 | learning rate: 9.489E-05 | global batch size: 256 | lm loss: 2.290599E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.047 | TFLOPs: 30.12 | 7: iteration 64310/ 115203 | consumed samples: 16463360 | consumed tokens: 33716961280 | elapsed time per iteration (s): 0.43 | learning rate: 9.487E-05 | global batch size: 256 | lm loss: 2.294627E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.268 | TFLOPs: 31.02 | 7: iteration 64320/ 115203 | consumed samples: 16465920 | consumed tokens: 33722204160 | elapsed time per iteration (s): 0.42 | learning rate: 9.485E-05 | global batch size: 256 | lm loss: 2.278015E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.307 | TFLOPs: 31.76 | 7: iteration 64330/ 115203 | consumed samples: 16468480 | consumed tokens: 33727447040 | elapsed time per iteration (s): 0.43 | learning rate: 9.482E-05 | global batch size: 256 | lm loss: 2.281430E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.149 | TFLOPs: 31.17 | 7: iteration 64340/ 115203 | consumed samples: 16471040 | consumed tokens: 33732689920 | elapsed time per iteration (s): 0.44 | learning rate: 9.480E-05 | global batch size: 256 | lm loss: 2.280982E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.408 | TFLOPs: 30.87 | 7: iteration 64350/ 115203 | consumed samples: 16473600 | consumed tokens: 33737932800 | elapsed time per iteration (s): 0.43 | learning rate: 9.477E-05 | global batch size: 256 | lm loss: 2.259019E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.238 | TFLOPs: 30.97 | 7: iteration 64360/ 115203 | consumed samples: 16476160 | consumed tokens: 33743175680 | elapsed time per iteration (s): 0.46 | learning rate: 9.475E-05 | global batch size: 256 | lm loss: 2.300636E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.723 | TFLOPs: 29.32 | 7: iteration 64370/ 115203 | consumed samples: 16478720 | consumed tokens: 33748418560 | elapsed time per iteration (s): 0.43 | learning rate: 9.472E-05 | global batch size: 256 | lm loss: 2.281402E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.835 | TFLOPs: 31.37 | 7: iteration 64380/ 115203 | consumed samples: 16481280 | consumed tokens: 33753661440 | elapsed time per iteration (s): 0.43 | learning rate: 9.470E-05 | global batch size: 256 | lm loss: 2.269790E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.889 | TFLOPs: 31.00 | 7: iteration 64390/ 115203 | consumed samples: 16483840 | consumed tokens: 33758904320 | elapsed time per iteration (s): 0.44 | learning rate: 9.467E-05 | global batch size: 256 | lm loss: 2.307533E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.711 | TFLOPs: 30.84 | 7: iteration 64400/ 115203 | consumed samples: 16486400 | consumed tokens: 33764147200 | elapsed time per iteration (s): 0.42 | learning rate: 9.465E-05 | global batch size: 256 | lm loss: 2.267874E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.563 | TFLOPs: 31.62 | 7: iteration 64410/ 115203 | consumed samples: 16488960 | consumed tokens: 33769390080 | elapsed time per iteration (s): 0.43 | learning rate: 9.463E-05 | global batch size: 256 | lm loss: 2.245790E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.828 | TFLOPs: 31.37 | 7: iteration 64420/ 115203 | consumed samples: 16491520 | consumed tokens: 33774632960 | elapsed time per iteration (s): 0.44 | learning rate: 9.460E-05 | global batch size: 256 | lm loss: 2.303656E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.621 | TFLOPs: 30.78 | 7: iteration 64430/ 115203 | consumed samples: 16494080 | consumed tokens: 33779875840 | elapsed time per iteration (s): 0.42 | learning rate: 9.458E-05 | global batch size: 256 | lm loss: 2.263428E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.839 | TFLOPs: 31.63 | 7: iteration 64440/ 115203 | consumed samples: 16496640 | consumed tokens: 33785118720 | elapsed time per iteration (s): 0.44 | learning rate: 9.455E-05 | global batch size: 256 | lm loss: 2.261452E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.315 | TFLOPs: 30.66 | 7: iteration 64450/ 115203 | consumed samples: 16499200 | consumed tokens: 33790361600 | elapsed time per iteration (s): 0.43 | learning rate: 9.453E-05 | global batch size: 256 | lm loss: 2.258209E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.999 | TFLOPs: 31.27 | 7: iteration 64460/ 115203 | consumed samples: 16501760 | consumed tokens: 33795604480 | elapsed time per iteration (s): 0.43 | learning rate: 9.450E-05 | global batch size: 256 | lm loss: 2.261595E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.297 | TFLOPs: 31.02 | 7: iteration 64470/ 115203 | consumed samples: 16504320 | consumed tokens: 33800847360 | elapsed time per iteration (s): 0.44 | learning rate: 9.448E-05 | global batch size: 256 | lm loss: 2.285941E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.708 | TFLOPs: 30.57 | 7: iteration 64480/ 115203 | consumed samples: 16506880 | consumed tokens: 33806090240 | elapsed time per iteration (s): 0.43 | learning rate: 9.446E-05 | global batch size: 256 | lm loss: 2.266146E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.789 | TFLOPs: 31.05 | 7: iteration 64490/ 115203 | consumed samples: 16509440 | consumed tokens: 33811333120 | elapsed time per iteration (s): 0.44 | learning rate: 9.443E-05 | global batch size: 256 | lm loss: 2.292933E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.656 | TFLOPs: 30.68 | 7: iteration 64500/ 115203 | consumed samples: 16512000 | consumed tokens: 33816576000 | elapsed time per iteration (s): 0.42 | learning rate: 9.441E-05 | global batch size: 256 | lm loss: 2.269143E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.278 | TFLOPs: 31.86 | 7: iteration 64510/ 115203 | consumed samples: 16514560 | consumed tokens: 33821818880 | elapsed time per iteration (s): 0.43 | learning rate: 9.438E-05 | global batch size: 256 | lm loss: 2.273805E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.548 | TFLOPs: 31.04 | 7: iteration 64520/ 115203 | consumed samples: 16517120 | consumed tokens: 33827061760 | elapsed time per iteration (s): 0.43 | learning rate: 9.436E-05 | global batch size: 256 | lm loss: 2.290386E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.030 | TFLOPs: 31.38 | 7: iteration 64530/ 115203 | consumed samples: 16519680 | consumed tokens: 33832304640 | elapsed time per iteration (s): 0.45 | learning rate: 9.433E-05 | global batch size: 256 | lm loss: 2.259282E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.784 | TFLOPs: 30.16 | 7: iteration 64540/ 115203 | consumed samples: 16522240 | consumed tokens: 33837547520 | elapsed time per iteration (s): 0.44 | learning rate: 9.431E-05 | global batch size: 256 | lm loss: 2.303975E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.535 | TFLOPs: 30.41 | 7: iteration 64550/ 115203 | consumed samples: 16524800 | consumed tokens: 33842790400 | elapsed time per iteration (s): 0.45 | learning rate: 9.428E-05 | global batch size: 256 | lm loss: 2.277252E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.466 | TFLOPs: 29.77 | 7: iteration 64560/ 115203 | consumed samples: 16527360 | consumed tokens: 33848033280 | elapsed time per iteration (s): 0.43 | learning rate: 9.426E-05 | global batch size: 256 | lm loss: 2.248226E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.969 | TFLOPs: 30.95 | 7: iteration 64570/ 115203 | consumed samples: 16529920 | consumed tokens: 33853276160 | elapsed time per iteration (s): 0.43 | learning rate: 9.424E-05 | global batch size: 256 | lm loss: 2.296285E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.358 | TFLOPs: 31.29 | 7: iteration 64580/ 115203 | consumed samples: 16532480 | consumed tokens: 33858519040 | elapsed time per iteration (s): 0.44 | learning rate: 9.421E-05 | global batch size: 256 | lm loss: 2.276752E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.743 | TFLOPs: 30.84 | 7: iteration 64590/ 115203 | consumed samples: 16535040 | consumed tokens: 33863761920 | elapsed time per iteration (s): 0.43 | learning rate: 9.419E-05 | global batch size: 256 | lm loss: 2.226749E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.833 | TFLOPs: 31.58 | 7: iteration 64600/ 115203 | consumed samples: 16537600 | consumed tokens: 33869004800 | elapsed time per iteration (s): 0.43 | learning rate: 9.416E-05 | global batch size: 256 | lm loss: 2.261616E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.879 | TFLOPs: 31.26 | 7: iteration 64610/ 115203 | consumed samples: 16540160 | consumed tokens: 33874247680 | elapsed time per iteration (s): 0.43 | learning rate: 9.414E-05 | global batch size: 256 | lm loss: 2.255558E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.437 | TFLOPs: 31.50 | 7: iteration 64620/ 115203 | consumed samples: 16542720 | consumed tokens: 33879490560 | elapsed time per iteration (s): 0.44 | learning rate: 9.411E-05 | global batch size: 256 | lm loss: 2.248460E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.806 | TFLOPs: 30.21 | 7: iteration 64630/ 115203 | consumed samples: 16545280 | consumed tokens: 33884733440 | elapsed time per iteration (s): 0.43 | learning rate: 9.409E-05 | global batch size: 256 | lm loss: 2.267510E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.052 | TFLOPs: 31.27 | 7: iteration 64640/ 115203 | consumed samples: 16547840 | consumed tokens: 33889976320 | elapsed time per iteration (s): 0.43 | learning rate: 9.406E-05 | global batch size: 256 | lm loss: 2.270525E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.100 | TFLOPs: 31.59 | 7: iteration 64650/ 115203 | consumed samples: 16550400 | consumed tokens: 33895219200 | elapsed time per iteration (s): 0.43 | learning rate: 9.404E-05 | global batch size: 256 | lm loss: 2.289362E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.656 | TFLOPs: 31.15 | 7: iteration 64660/ 115203 | consumed samples: 16552960 | consumed tokens: 33900462080 | elapsed time per iteration (s): 0.43 | learning rate: 9.402E-05 | global batch size: 256 | lm loss: 2.286366E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.277 | TFLOPs: 31.29 | 7: iteration 64670/ 115203 | consumed samples: 16555520 | consumed tokens: 33905704960 | elapsed time per iteration (s): 0.44 | learning rate: 9.399E-05 | global batch size: 256 | lm loss: 2.236116E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.917 | TFLOPs: 30.53 | 7: iteration 64680/ 115203 | consumed samples: 16558080 | consumed tokens: 33910947840 | elapsed time per iteration (s): 0.43 | learning rate: 9.397E-05 | global batch size: 256 | lm loss: 2.264243E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.831 | TFLOPs: 31.21 | 7: iteration 64690/ 115203 | consumed samples: 16560640 | consumed tokens: 33916190720 | elapsed time per iteration (s): 0.44 | learning rate: 9.394E-05 | global batch size: 256 | lm loss: 2.280307E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.739 | TFLOPs: 30.84 | 7: iteration 64700/ 115203 | consumed samples: 16563200 | consumed tokens: 33921433600 | elapsed time per iteration (s): 0.43 | learning rate: 9.392E-05 | global batch size: 256 | lm loss: 2.300899E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.324 | TFLOPs: 30.92 | 7: iteration 64710/ 115203 | consumed samples: 16565760 | consumed tokens: 33926676480 | elapsed time per iteration (s): 0.43 | learning rate: 9.389E-05 | global batch size: 256 | lm loss: 2.288103E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.839 | TFLOPs: 31.16 | 7: iteration 64720/ 115203 | consumed samples: 16568320 | consumed tokens: 33931919360 | elapsed time per iteration (s): 0.43 | learning rate: 9.387E-05 | global batch size: 256 | lm loss: 2.235850E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.286 | TFLOPs: 31.13 | 7: iteration 64730/ 115203 | consumed samples: 16570880 | consumed tokens: 33937162240 | elapsed time per iteration (s): 0.43 | learning rate: 9.384E-05 | global batch size: 256 | lm loss: 2.263524E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.902 | TFLOPs: 31.27 | 7: iteration 64740/ 115203 | consumed samples: 16573440 | consumed tokens: 33942405120 | elapsed time per iteration (s): 0.42 | learning rate: 9.382E-05 | global batch size: 256 | lm loss: 2.257649E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.423 | TFLOPs: 31.77 | 7: iteration 64750/ 115203 | consumed samples: 16576000 | consumed tokens: 33947648000 | elapsed time per iteration (s): 0.44 | learning rate: 9.380E-05 | global batch size: 256 | lm loss: 2.250864E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.230 | TFLOPs: 30.39 | 7: iteration 64760/ 115203 | consumed samples: 16578560 | consumed tokens: 33952890880 | elapsed time per iteration (s): 0.43 | learning rate: 9.377E-05 | global batch size: 256 | lm loss: 2.259273E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.300 | TFLOPs: 31.39 | 7: iteration 64770/ 115203 | consumed samples: 16581120 | consumed tokens: 33958133760 | elapsed time per iteration (s): 0.43 | learning rate: 9.375E-05 | global batch size: 256 | lm loss: 2.273988E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.748 | TFLOPs: 31.31 | 7: iteration 64780/ 115203 | consumed samples: 16583680 | consumed tokens: 33963376640 | elapsed time per iteration (s): 0.43 | learning rate: 9.372E-05 | global batch size: 256 | lm loss: 2.270404E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.262 | TFLOPs: 31.55 | 7: iteration 64790/ 115203 | consumed samples: 16586240 | consumed tokens: 33968619520 | elapsed time per iteration (s): 0.44 | learning rate: 9.370E-05 | global batch size: 256 | lm loss: 2.263452E+00 | grad norm: 0.207 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.547 | TFLOPs: 30.83 | 7: iteration 64800/ 115203 | consumed samples: 16588800 | consumed tokens: 33973862400 | elapsed time per iteration (s): 0.44 | learning rate: 9.367E-05 | global batch size: 256 | lm loss: 2.252586E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.686 | TFLOPs: 30.83 | 7: iteration 64810/ 115203 | consumed samples: 16591360 | consumed tokens: 33979105280 | elapsed time per iteration (s): 0.43 | learning rate: 9.365E-05 | global batch size: 256 | lm loss: 2.274294E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.288 | TFLOPs: 31.39 | 7: iteration 64820/ 115203 | consumed samples: 16593920 | consumed tokens: 33984348160 | elapsed time per iteration (s): 0.43 | learning rate: 9.363E-05 | global batch size: 256 | lm loss: 2.278280E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.180 | TFLOPs: 31.07 | 7: iteration 64830/ 115203 | consumed samples: 16596480 | consumed tokens: 33989591040 | elapsed time per iteration (s): 0.43 | learning rate: 9.360E-05 | global batch size: 256 | lm loss: 2.264683E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.319 | TFLOPs: 31.08 | 7: iteration 64840/ 115203 | consumed samples: 16599040 | consumed tokens: 33994833920 | elapsed time per iteration (s): 0.43 | learning rate: 9.358E-05 | global batch size: 256 | lm loss: 2.232133E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.263 | TFLOPs: 31.34 | 7: iteration 64850/ 115203 | consumed samples: 16601600 | consumed tokens: 34000076800 | elapsed time per iteration (s): 0.43 | learning rate: 9.355E-05 | global batch size: 256 | lm loss: 2.270741E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.369 | TFLOPs: 31.29 | 7: iteration 64860/ 115203 | consumed samples: 16604160 | consumed tokens: 34005319680 | elapsed time per iteration (s): 0.44 | learning rate: 9.353E-05 | global batch size: 256 | lm loss: 2.260921E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.264 | TFLOPs: 30.66 | 7: iteration 64870/ 115203 | consumed samples: 16606720 | consumed tokens: 34010562560 | elapsed time per iteration (s): 0.44 | learning rate: 9.350E-05 | global batch size: 256 | lm loss: 2.289415E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.951 | TFLOPs: 30.85 | 7: iteration 64880/ 115203 | consumed samples: 16609280 | consumed tokens: 34015805440 | elapsed time per iteration (s): 0.43 | learning rate: 9.348E-05 | global batch size: 256 | lm loss: 2.243622E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.146 | TFLOPs: 31.17 | 7: iteration 64890/ 115203 | consumed samples: 16611840 | consumed tokens: 34021048320 | elapsed time per iteration (s): 0.43 | learning rate: 9.345E-05 | global batch size: 256 | lm loss: 2.289103E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.897 | TFLOPs: 30.95 | 7: iteration 64900/ 115203 | consumed samples: 16614400 | consumed tokens: 34026291200 | elapsed time per iteration (s): 0.43 | learning rate: 9.343E-05 | global batch size: 256 | lm loss: 2.241659E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.262 | TFLOPs: 31.18 | 7: iteration 64910/ 115203 | consumed samples: 16616960 | consumed tokens: 34031534080 | elapsed time per iteration (s): 0.43 | learning rate: 9.341E-05 | global batch size: 256 | lm loss: 2.284365E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.686 | TFLOPs: 31.36 | 7: iteration 64920/ 115203 | consumed samples: 16619520 | consumed tokens: 34036776960 | elapsed time per iteration (s): 0.44 | learning rate: 9.338E-05 | global batch size: 256 | lm loss: 2.278722E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.780 | TFLOPs: 30.68 | 7: iteration 64930/ 115203 | consumed samples: 16622080 | consumed tokens: 34042019840 | elapsed time per iteration (s): 0.44 | learning rate: 9.336E-05 | global batch size: 256 | lm loss: 2.256697E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.386 | TFLOPs: 30.87 | 7: iteration 64940/ 115203 | consumed samples: 16624640 | consumed tokens: 34047262720 | elapsed time per iteration (s): 0.43 | learning rate: 9.333E-05 | global batch size: 256 | lm loss: 2.288334E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.247 | TFLOPs: 31.23 | 7: iteration 64950/ 115203 | consumed samples: 16627200 | consumed tokens: 34052505600 | elapsed time per iteration (s): 0.43 | learning rate: 9.331E-05 | global batch size: 256 | lm loss: 2.277707E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.971 | TFLOPs: 31.22 | 7: iteration 64960/ 115203 | consumed samples: 16629760 | consumed tokens: 34057748480 | elapsed time per iteration (s): 0.43 | learning rate: 9.328E-05 | global batch size: 256 | lm loss: 2.244350E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.895 | TFLOPs: 31.53 | 7: iteration 64970/ 115203 | consumed samples: 16632320 | consumed tokens: 34062991360 | elapsed time per iteration (s): 0.44 | learning rate: 9.326E-05 | global batch size: 256 | lm loss: 2.302968E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.374 | TFLOPs: 30.87 | 7: iteration 64980/ 115203 | consumed samples: 16634880 | consumed tokens: 34068234240 | elapsed time per iteration (s): 0.45 | learning rate: 9.324E-05 | global batch size: 256 | lm loss: 2.238747E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.948 | TFLOPs: 30.06 | 7: iteration 64990/ 115203 | consumed samples: 16637440 | consumed tokens: 34073477120 | elapsed time per iteration (s): 0.43 | learning rate: 9.321E-05 | global batch size: 256 | lm loss: 2.291064E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.909 | TFLOPs: 31.37 | 7: iteration 65000/ 115203 | consumed samples: 16640000 | consumed tokens: 34078720000 | elapsed time per iteration (s): 0.43 | learning rate: 9.319E-05 | global batch size: 256 | lm loss: 2.248592E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.454 | TFLOPs: 31.45 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 65000 | lm loss value: 2.147310E+00 | lm loss PPL: 8.561800E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 65000 to checkpoints_221m 0: [2022-11-28 20:46:40,365] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step65000 is begin to save! 0: [2022-11-28 20:46:40,372] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:46:40,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:46:40,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:46:40,505] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:46:40,505] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:46:40,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:46:40,529] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:46:40,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:46:40,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:46:40,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:46:40,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:46:40,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:46:40,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:46:40,627] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:46:40,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:46:40,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:46:40,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:46:40,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:46:40,679] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:46:40,704] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:46:40,704] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:46:40,728] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:46:40,729] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:46:40,753] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:46:40,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:46:40,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:46:40,777] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:46:40,802] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:46:40,802] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:46:40,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:46:40,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:46:40,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:46:40,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:46:40,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:46:40,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:46:40,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:46:40,898] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:46:40,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:46:40,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:46:40,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:46:40,929] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step65000/mp_rank_00_model_states.pt 0: [2022-11-28 20:46:40,929] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:46:40,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:46:40,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:46:40,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:46:40,996] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:40,996] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:40,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:40,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:40,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:40,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:40,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:40,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:40,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:41,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:41,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:41,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:46:41,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 7: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:41,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 4: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:46:41,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 20:46:41,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,012] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 1: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:46:41,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:46:41,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:46:41,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:46:41,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 20:46:41,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:46:41,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 3: [2022-11-28 20:46:41,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: [2022-11-28 20:46:41,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:46:41,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 6: [2022-11-28 20:46:41,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:46:41,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step65000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 20:46:41,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step65000 is ready now! 0: successfully saved checkpoint at iteration 65000 to checkpoints_221m 7: time (ms) | save-checkpoint: 773.55 7: iteration 65010/ 115203 | consumed samples: 16642560 | consumed tokens: 34083962880 | elapsed time per iteration (s): 0.52 | learning rate: 9.316E-05 | global batch size: 256 | lm loss: 2.253210E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 487.641 | TFLOPs: 25.59 | 7: iteration 65020/ 115203 | consumed samples: 16645120 | consumed tokens: 34089205760 | elapsed time per iteration (s): 0.46 | learning rate: 9.314E-05 | global batch size: 256 | lm loss: 2.276021E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.223 | TFLOPs: 29.34 | 7: iteration 65030/ 115203 | consumed samples: 16647680 | consumed tokens: 34094448640 | elapsed time per iteration (s): 0.43 | learning rate: 9.311E-05 | global batch size: 256 | lm loss: 2.248840E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.028 | TFLOPs: 31.38 | 7: iteration 65040/ 115203 | consumed samples: 16650240 | consumed tokens: 34099691520 | elapsed time per iteration (s): 0.44 | learning rate: 9.309E-05 | global batch size: 256 | lm loss: 2.283652E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.864 | TFLOPs: 30.84 | 7: iteration 65050/ 115203 | consumed samples: 16652800 | consumed tokens: 34104934400 | elapsed time per iteration (s): 0.43 | learning rate: 9.307E-05 | global batch size: 256 | lm loss: 2.289226E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.063 | TFLOPs: 31.17 | 7: iteration 65060/ 115203 | consumed samples: 16655360 | consumed tokens: 34110177280 | elapsed time per iteration (s): 0.42 | learning rate: 9.304E-05 | global batch size: 256 | lm loss: 2.279412E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.495 | TFLOPs: 31.93 | 7: iteration 65070/ 115203 | consumed samples: 16657920 | consumed tokens: 34115420160 | elapsed time per iteration (s): 0.44 | learning rate: 9.302E-05 | global batch size: 256 | lm loss: 2.278121E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.099 | TFLOPs: 30.75 | 7: iteration 65080/ 115203 | consumed samples: 16660480 | consumed tokens: 34120663040 | elapsed time per iteration (s): 0.42 | learning rate: 9.299E-05 | global batch size: 256 | lm loss: 2.291827E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.850 | TFLOPs: 31.68 | 7: iteration 65090/ 115203 | consumed samples: 16663040 | consumed tokens: 34125905920 | elapsed time per iteration (s): 0.43 | learning rate: 9.297E-05 | global batch size: 256 | lm loss: 2.282482E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.153 | TFLOPs: 31.12 | 7: iteration 65100/ 115203 | consumed samples: 16665600 | consumed tokens: 34131148800 | elapsed time per iteration (s): 0.44 | learning rate: 9.294E-05 | global batch size: 256 | lm loss: 2.269037E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.482 | TFLOPs: 30.82 | 7: iteration 65110/ 115203 | consumed samples: 16668160 | consumed tokens: 34136391680 | elapsed time per iteration (s): 0.43 | learning rate: 9.292E-05 | global batch size: 256 | lm loss: 2.252222E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.310 | TFLOPs: 31.34 | 7: iteration 65120/ 115203 | consumed samples: 16670720 | consumed tokens: 34141634560 | elapsed time per iteration (s): 0.43 | learning rate: 9.289E-05 | global batch size: 256 | lm loss: 2.217582E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.833 | TFLOPs: 31.52 | 7: iteration 65130/ 115203 | consumed samples: 16673280 | consumed tokens: 34146877440 | elapsed time per iteration (s): 0.43 | learning rate: 9.287E-05 | global batch size: 256 | lm loss: 2.285785E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.299 | TFLOPs: 31.23 | 7: iteration 65140/ 115203 | consumed samples: 16675840 | consumed tokens: 34152120320 | elapsed time per iteration (s): 0.43 | learning rate: 9.285E-05 | global batch size: 256 | lm loss: 2.305146E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.138 | TFLOPs: 31.07 | 7: iteration 65150/ 115203 | consumed samples: 16678400 | consumed tokens: 34157363200 | elapsed time per iteration (s): 0.43 | learning rate: 9.282E-05 | global batch size: 256 | lm loss: 2.286090E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.873 | TFLOPs: 30.90 | 7: iteration 65160/ 115203 | consumed samples: 16680960 | consumed tokens: 34162606080 | elapsed time per iteration (s): 0.43 | learning rate: 9.280E-05 | global batch size: 256 | lm loss: 2.277351E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.284 | TFLOPs: 31.08 | 7: iteration 65170/ 115203 | consumed samples: 16683520 | consumed tokens: 34167848960 | elapsed time per iteration (s): 0.44 | learning rate: 9.277E-05 | global batch size: 256 | lm loss: 2.290252E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.010 | TFLOPs: 30.43 | 7: iteration 65180/ 115203 | consumed samples: 16686080 | consumed tokens: 34173091840 | elapsed time per iteration (s): 0.45 | learning rate: 9.275E-05 | global batch size: 256 | lm loss: 2.271970E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.000 | TFLOPs: 29.80 | 7: iteration 65190/ 115203 | consumed samples: 16688640 | consumed tokens: 34178334720 | elapsed time per iteration (s): 0.43 | learning rate: 9.272E-05 | global batch size: 256 | lm loss: 2.289095E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.449 | TFLOPs: 31.56 | 7: iteration 65200/ 115203 | consumed samples: 16691200 | consumed tokens: 34183577600 | elapsed time per iteration (s): 0.43 | learning rate: 9.270E-05 | global batch size: 256 | lm loss: 2.295886E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.275 | TFLOPs: 31.02 | 7: iteration 65210/ 115203 | consumed samples: 16693760 | consumed tokens: 34188820480 | elapsed time per iteration (s): 0.44 | learning rate: 9.268E-05 | global batch size: 256 | lm loss: 2.238800E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.705 | TFLOPs: 30.84 | 7: iteration 65220/ 115203 | consumed samples: 16696320 | consumed tokens: 34194063360 | elapsed time per iteration (s): 0.45 | learning rate: 9.265E-05 | global batch size: 256 | lm loss: 2.282121E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.283 | TFLOPs: 29.87 | 7: iteration 65230/ 115203 | consumed samples: 16698880 | consumed tokens: 34199306240 | elapsed time per iteration (s): 0.43 | learning rate: 9.263E-05 | global batch size: 256 | lm loss: 2.275925E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.829 | TFLOPs: 31.21 | 7: iteration 65240/ 115203 | consumed samples: 16701440 | consumed tokens: 34204549120 | elapsed time per iteration (s): 0.43 | learning rate: 9.260E-05 | global batch size: 256 | lm loss: 2.265966E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.863 | TFLOPs: 30.90 | 7: iteration 65250/ 115203 | consumed samples: 16704000 | consumed tokens: 34209792000 | elapsed time per iteration (s): 0.44 | learning rate: 9.258E-05 | global batch size: 256 | lm loss: 2.292532E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.562 | TFLOPs: 30.41 | 7: iteration 65260/ 115203 | consumed samples: 16706560 | consumed tokens: 34215034880 | elapsed time per iteration (s): 0.43 | learning rate: 9.255E-05 | global batch size: 256 | lm loss: 2.288497E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.179 | TFLOPs: 31.02 | 7: iteration 65270/ 115203 | consumed samples: 16709120 | consumed tokens: 34220277760 | elapsed time per iteration (s): 0.43 | learning rate: 9.253E-05 | global batch size: 256 | lm loss: 2.279651E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.225 | TFLOPs: 31.44 | 7: iteration 65280/ 115203 | consumed samples: 16711680 | consumed tokens: 34225520640 | elapsed time per iteration (s): 0.43 | learning rate: 9.251E-05 | global batch size: 256 | lm loss: 2.282127E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.187 | TFLOPs: 30.91 | 7: iteration 65290/ 115203 | consumed samples: 16714240 | consumed tokens: 34230763520 | elapsed time per iteration (s): 0.43 | learning rate: 9.248E-05 | global batch size: 256 | lm loss: 2.280363E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.494 | TFLOPs: 31.19 | 7: iteration 65300/ 115203 | consumed samples: 16716800 | consumed tokens: 34236006400 | elapsed time per iteration (s): 0.43 | learning rate: 9.246E-05 | global batch size: 256 | lm loss: 2.291826E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.080 | TFLOPs: 31.49 | 7: iteration 65310/ 115203 | consumed samples: 16719360 | consumed tokens: 34241249280 | elapsed time per iteration (s): 0.44 | learning rate: 9.243E-05 | global batch size: 256 | lm loss: 2.279266E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.994 | TFLOPs: 30.75 | 7: iteration 65320/ 115203 | consumed samples: 16721920 | consumed tokens: 34246492160 | elapsed time per iteration (s): 0.43 | learning rate: 9.241E-05 | global batch size: 256 | lm loss: 2.271704E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.623 | TFLOPs: 31.15 | 7: iteration 65330/ 115203 | consumed samples: 16724480 | consumed tokens: 34251735040 | elapsed time per iteration (s): 0.43 | learning rate: 9.238E-05 | global batch size: 256 | lm loss: 2.263650E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.279 | TFLOPs: 31.08 | 7: iteration 65340/ 115203 | consumed samples: 16727040 | consumed tokens: 34256977920 | elapsed time per iteration (s): 0.43 | learning rate: 9.236E-05 | global batch size: 256 | lm loss: 2.261965E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.130 | TFLOPs: 31.59 | 7: iteration 65350/ 115203 | consumed samples: 16729600 | consumed tokens: 34262220800 | elapsed time per iteration (s): 0.43 | learning rate: 9.234E-05 | global batch size: 256 | lm loss: 2.256832E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.829 | TFLOPs: 31.16 | 7: iteration 65360/ 115203 | consumed samples: 16732160 | consumed tokens: 34267463680 | elapsed time per iteration (s): 0.44 | learning rate: 9.231E-05 | global batch size: 256 | lm loss: 2.260224E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.031 | TFLOPs: 30.85 | 7: iteration 65370/ 115203 | consumed samples: 16734720 | consumed tokens: 34272706560 | elapsed time per iteration (s): 0.43 | learning rate: 9.229E-05 | global batch size: 256 | lm loss: 2.288341E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.077 | TFLOPs: 31.59 | 7: iteration 65380/ 115203 | consumed samples: 16737280 | consumed tokens: 34277949440 | elapsed time per iteration (s): 0.43 | learning rate: 9.226E-05 | global batch size: 256 | lm loss: 2.288843E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.055 | TFLOPs: 30.91 | 7: iteration 65390/ 115203 | consumed samples: 16739840 | consumed tokens: 34283192320 | elapsed time per iteration (s): 0.43 | learning rate: 9.224E-05 | global batch size: 256 | lm loss: 2.247845E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.548 | TFLOPs: 30.88 | 7: iteration 65400/ 115203 | consumed samples: 16742400 | consumed tokens: 34288435200 | elapsed time per iteration (s): 0.43 | learning rate: 9.221E-05 | global batch size: 256 | lm loss: 2.259902E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.942 | TFLOPs: 30.95 | 7: iteration 65410/ 115203 | consumed samples: 16744960 | consumed tokens: 34293678080 | elapsed time per iteration (s): 0.43 | learning rate: 9.219E-05 | global batch size: 256 | lm loss: 2.262893E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.361 | TFLOPs: 31.40 | 7: iteration 65420/ 115203 | consumed samples: 16747520 | consumed tokens: 34298920960 | elapsed time per iteration (s): 0.43 | learning rate: 9.217E-05 | global batch size: 256 | lm loss: 2.290112E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.442 | TFLOPs: 31.19 | 7: iteration 65430/ 115203 | consumed samples: 16750080 | consumed tokens: 34304163840 | elapsed time per iteration (s): 0.43 | learning rate: 9.214E-05 | global batch size: 256 | lm loss: 2.273866E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.562 | TFLOPs: 31.51 | 7: iteration 65440/ 115203 | consumed samples: 16752640 | consumed tokens: 34309406720 | elapsed time per iteration (s): 0.43 | learning rate: 9.212E-05 | global batch size: 256 | lm loss: 2.291089E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.883 | TFLOPs: 31.32 | 7: iteration 65450/ 115203 | consumed samples: 16755200 | consumed tokens: 34314649600 | elapsed time per iteration (s): 0.43 | learning rate: 9.209E-05 | global batch size: 256 | lm loss: 2.250299E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.689 | TFLOPs: 31.31 | 7: iteration 65460/ 115203 | consumed samples: 16757760 | consumed tokens: 34319892480 | elapsed time per iteration (s): 0.43 | learning rate: 9.207E-05 | global batch size: 256 | lm loss: 2.279227E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.596 | TFLOPs: 31.35 | 7: iteration 65470/ 115203 | consumed samples: 16760320 | consumed tokens: 34325135360 | elapsed time per iteration (s): 0.43 | learning rate: 9.204E-05 | global batch size: 256 | lm loss: 2.246253E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.889 | TFLOPs: 31.48 | 7: iteration 65480/ 115203 | consumed samples: 16762880 | consumed tokens: 34330378240 | elapsed time per iteration (s): 0.43 | learning rate: 9.202E-05 | global batch size: 256 | lm loss: 2.297999E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.386 | TFLOPs: 31.55 | 7: iteration 65490/ 115203 | consumed samples: 16765440 | consumed tokens: 34335621120 | elapsed time per iteration (s): 0.43 | learning rate: 9.200E-05 | global batch size: 256 | lm loss: 2.275170E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.539 | TFLOPs: 31.25 | 7: iteration 65500/ 115203 | consumed samples: 16768000 | consumed tokens: 34340864000 | elapsed time per iteration (s): 0.44 | learning rate: 9.197E-05 | global batch size: 256 | lm loss: 2.276800E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.291 | TFLOPs: 30.76 | 7: iteration 65510/ 115203 | consumed samples: 16770560 | consumed tokens: 34346106880 | elapsed time per iteration (s): 0.43 | learning rate: 9.195E-05 | global batch size: 256 | lm loss: 2.259338E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.675 | TFLOPs: 30.94 | 7: iteration 65520/ 115203 | consumed samples: 16773120 | consumed tokens: 34351349760 | elapsed time per iteration (s): 0.43 | learning rate: 9.192E-05 | global batch size: 256 | lm loss: 2.259046E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.025 | TFLOPs: 31.43 | 7: iteration 65530/ 115203 | consumed samples: 16775680 | consumed tokens: 34356592640 | elapsed time per iteration (s): 0.43 | learning rate: 9.190E-05 | global batch size: 256 | lm loss: 2.265112E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.831 | TFLOPs: 31.10 | 7: iteration 65540/ 115203 | consumed samples: 16778240 | consumed tokens: 34361835520 | elapsed time per iteration (s): 0.43 | learning rate: 9.187E-05 | global batch size: 256 | lm loss: 2.253818E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.328 | TFLOPs: 31.18 | 7: iteration 65550/ 115203 | consumed samples: 16780800 | consumed tokens: 34367078400 | elapsed time per iteration (s): 0.43 | learning rate: 9.185E-05 | global batch size: 256 | lm loss: 2.267164E+00 | grad norm: 0.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.794 | TFLOPs: 31.47 | 7: iteration 65560/ 115203 | consumed samples: 16783360 | consumed tokens: 34372321280 | elapsed time per iteration (s): 0.44 | learning rate: 9.183E-05 | global batch size: 256 | lm loss: 2.254243E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.809 | TFLOPs: 30.63 | 7: iteration 65570/ 115203 | consumed samples: 16785920 | consumed tokens: 34377564160 | elapsed time per iteration (s): 0.43 | learning rate: 9.180E-05 | global batch size: 256 | lm loss: 2.300098E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.778 | TFLOPs: 31.42 | 7: iteration 65580/ 115203 | consumed samples: 16788480 | consumed tokens: 34382807040 | elapsed time per iteration (s): 0.44 | learning rate: 9.178E-05 | global batch size: 256 | lm loss: 2.259162E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.308 | TFLOPs: 30.34 | 7: iteration 65590/ 115203 | consumed samples: 16791040 | consumed tokens: 34388049920 | elapsed time per iteration (s): 0.43 | learning rate: 9.175E-05 | global batch size: 256 | lm loss: 2.277037E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.702 | TFLOPs: 31.26 | 7: iteration 65600/ 115203 | consumed samples: 16793600 | consumed tokens: 34393292800 | elapsed time per iteration (s): 0.43 | learning rate: 9.173E-05 | global batch size: 256 | lm loss: 2.269864E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.447 | TFLOPs: 31.19 | 7: iteration 65610/ 115203 | consumed samples: 16796160 | consumed tokens: 34398535680 | elapsed time per iteration (s): 0.42 | learning rate: 9.170E-05 | global batch size: 256 | lm loss: 2.242975E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.668 | TFLOPs: 31.78 | 7: iteration 65620/ 115203 | consumed samples: 16798720 | consumed tokens: 34403778560 | elapsed time per iteration (s): 0.43 | learning rate: 9.168E-05 | global batch size: 256 | lm loss: 2.289990E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.424 | TFLOPs: 31.14 | 7: iteration 65630/ 115203 | consumed samples: 16801280 | consumed tokens: 34409021440 | elapsed time per iteration (s): 0.43 | learning rate: 9.166E-05 | global batch size: 256 | lm loss: 2.241191E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.878 | TFLOPs: 31.53 | 7: iteration 65640/ 115203 | consumed samples: 16803840 | consumed tokens: 34414264320 | elapsed time per iteration (s): 0.42 | learning rate: 9.163E-05 | global batch size: 256 | lm loss: 2.255167E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.712 | TFLOPs: 31.62 | 7: iteration 65650/ 115203 | consumed samples: 16806400 | consumed tokens: 34419507200 | elapsed time per iteration (s): 0.43 | learning rate: 9.161E-05 | global batch size: 256 | lm loss: 2.276089E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.137 | TFLOPs: 31.28 | 7: iteration 65660/ 115203 | consumed samples: 16808960 | consumed tokens: 34424750080 | elapsed time per iteration (s): 0.45 | learning rate: 9.158E-05 | global batch size: 256 | lm loss: 2.256817E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.209 | TFLOPs: 30.18 | 7: iteration 65670/ 115203 | consumed samples: 16811520 | consumed tokens: 34429992960 | elapsed time per iteration (s): 0.42 | learning rate: 9.156E-05 | global batch size: 256 | lm loss: 2.259157E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.155 | TFLOPs: 31.86 | 7: iteration 65680/ 115203 | consumed samples: 16814080 | consumed tokens: 34435235840 | elapsed time per iteration (s): 0.44 | learning rate: 9.153E-05 | global batch size: 256 | lm loss: 2.274226E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.401 | TFLOPs: 30.82 | 7: iteration 65690/ 115203 | consumed samples: 16816640 | consumed tokens: 34440478720 | elapsed time per iteration (s): 0.43 | learning rate: 9.151E-05 | global batch size: 256 | lm loss: 2.269361E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.095 | TFLOPs: 31.07 | 7: iteration 65700/ 115203 | consumed samples: 16819200 | consumed tokens: 34445721600 | elapsed time per iteration (s): 0.43 | learning rate: 9.149E-05 | global batch size: 256 | lm loss: 2.264232E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.302 | TFLOPs: 31.08 | 7: iteration 65710/ 115203 | consumed samples: 16821760 | consumed tokens: 34450964480 | elapsed time per iteration (s): 0.43 | learning rate: 9.146E-05 | global batch size: 256 | lm loss: 2.240628E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.230 | TFLOPs: 31.07 | 7: iteration 65720/ 115203 | consumed samples: 16824320 | consumed tokens: 34456207360 | elapsed time per iteration (s): 0.43 | learning rate: 9.144E-05 | global batch size: 256 | lm loss: 2.256833E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.636 | TFLOPs: 31.30 | 7: iteration 65730/ 115203 | consumed samples: 16826880 | consumed tokens: 34461450240 | elapsed time per iteration (s): 0.43 | learning rate: 9.141E-05 | global batch size: 256 | lm loss: 2.253653E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.089 | TFLOPs: 31.43 | 7: iteration 65740/ 115203 | consumed samples: 16829440 | consumed tokens: 34466693120 | elapsed time per iteration (s): 0.44 | learning rate: 9.139E-05 | global batch size: 256 | lm loss: 2.300272E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.188 | TFLOPs: 30.86 | 7: iteration 65750/ 115203 | consumed samples: 16832000 | consumed tokens: 34471936000 | elapsed time per iteration (s): 0.45 | learning rate: 9.136E-05 | global batch size: 256 | lm loss: 2.293220E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.343 | TFLOPs: 29.98 | 7: iteration 65760/ 115203 | consumed samples: 16834560 | consumed tokens: 34477178880 | elapsed time per iteration (s): 0.44 | learning rate: 9.134E-05 | global batch size: 256 | lm loss: 2.283570E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.436 | TFLOPs: 30.51 | 7: iteration 65770/ 115203 | consumed samples: 16837120 | consumed tokens: 34482421760 | elapsed time per iteration (s): 0.43 | learning rate: 9.132E-05 | global batch size: 256 | lm loss: 2.259851E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.903 | TFLOPs: 31.11 | 7: iteration 65780/ 115203 | consumed samples: 16839680 | consumed tokens: 34487664640 | elapsed time per iteration (s): 0.44 | learning rate: 9.129E-05 | global batch size: 256 | lm loss: 2.257934E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.316 | TFLOPs: 30.19 | 7: iteration 65790/ 115203 | consumed samples: 16842240 | consumed tokens: 34492907520 | elapsed time per iteration (s): 0.43 | learning rate: 9.127E-05 | global batch size: 256 | lm loss: 2.279264E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.856 | TFLOPs: 31.53 | 7: iteration 65800/ 115203 | consumed samples: 16844800 | consumed tokens: 34498150400 | elapsed time per iteration (s): 0.43 | learning rate: 9.124E-05 | global batch size: 256 | lm loss: 2.245480E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.386 | TFLOPs: 31.55 | 7: iteration 65810/ 115203 | consumed samples: 16847360 | consumed tokens: 34503393280 | elapsed time per iteration (s): 0.43 | learning rate: 9.122E-05 | global batch size: 256 | lm loss: 2.272297E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.203 | TFLOPs: 31.12 | 7: iteration 65820/ 115203 | consumed samples: 16849920 | consumed tokens: 34508636160 | elapsed time per iteration (s): 0.43 | learning rate: 9.119E-05 | global batch size: 256 | lm loss: 2.284367E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.026 | TFLOPs: 31.17 | 7: iteration 65830/ 115203 | consumed samples: 16852480 | consumed tokens: 34513879040 | elapsed time per iteration (s): 0.45 | learning rate: 9.117E-05 | global batch size: 256 | lm loss: 2.255311E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.137 | TFLOPs: 29.60 | 7: iteration 65840/ 115203 | consumed samples: 16855040 | consumed tokens: 34519121920 | elapsed time per iteration (s): 0.44 | learning rate: 9.115E-05 | global batch size: 256 | lm loss: 2.264513E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.202 | TFLOPs: 30.86 | 7: iteration 65850/ 115203 | consumed samples: 16857600 | consumed tokens: 34524364800 | elapsed time per iteration (s): 0.42 | learning rate: 9.112E-05 | global batch size: 256 | lm loss: 2.254781E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.496 | TFLOPs: 31.61 | 7: iteration 65860/ 115203 | consumed samples: 16860160 | consumed tokens: 34529607680 | elapsed time per iteration (s): 0.43 | learning rate: 9.110E-05 | global batch size: 256 | lm loss: 2.269021E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.031 | TFLOPs: 31.48 | 7: iteration 65870/ 115203 | consumed samples: 16862720 | consumed tokens: 34534850560 | elapsed time per iteration (s): 0.43 | learning rate: 9.107E-05 | global batch size: 256 | lm loss: 2.279604E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.998 | TFLOPs: 31.01 | 7: iteration 65880/ 115203 | consumed samples: 16865280 | consumed tokens: 34540093440 | elapsed time per iteration (s): 0.45 | learning rate: 9.105E-05 | global batch size: 256 | lm loss: 2.288966E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.708 | TFLOPs: 30.05 | 7: iteration 65890/ 115203 | consumed samples: 16867840 | consumed tokens: 34545336320 | elapsed time per iteration (s): 0.43 | learning rate: 9.102E-05 | global batch size: 256 | lm loss: 2.260084E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.784 | TFLOPs: 30.89 | 7: iteration 65900/ 115203 | consumed samples: 16870400 | consumed tokens: 34550579200 | elapsed time per iteration (s): 0.42 | learning rate: 9.100E-05 | global batch size: 256 | lm loss: 2.271316E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.063 | TFLOPs: 32.06 | 7: iteration 65910/ 115203 | consumed samples: 16872960 | consumed tokens: 34555822080 | elapsed time per iteration (s): 0.44 | learning rate: 9.098E-05 | global batch size: 256 | lm loss: 2.243120E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.365 | TFLOPs: 30.61 | 7: iteration 65920/ 115203 | consumed samples: 16875520 | consumed tokens: 34561064960 | elapsed time per iteration (s): 0.43 | learning rate: 9.095E-05 | global batch size: 256 | lm loss: 2.296310E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.169 | TFLOPs: 31.54 | 7: iteration 65930/ 115203 | consumed samples: 16878080 | consumed tokens: 34566307840 | elapsed time per iteration (s): 0.44 | learning rate: 9.093E-05 | global batch size: 256 | lm loss: 2.270488E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.725 | TFLOPs: 30.78 | 7: iteration 65940/ 115203 | consumed samples: 16880640 | consumed tokens: 34571550720 | elapsed time per iteration (s): 0.43 | learning rate: 9.090E-05 | global batch size: 256 | lm loss: 2.296533E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.463 | TFLOPs: 31.09 | 7: iteration 65950/ 115203 | consumed samples: 16883200 | consumed tokens: 34576793600 | elapsed time per iteration (s): 0.42 | learning rate: 9.088E-05 | global batch size: 256 | lm loss: 2.283777E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.093 | TFLOPs: 31.75 | 7: iteration 65960/ 115203 | consumed samples: 16885760 | consumed tokens: 34582036480 | elapsed time per iteration (s): 0.43 | learning rate: 9.086E-05 | global batch size: 256 | lm loss: 2.275617E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.805 | TFLOPs: 31.42 | 7: iteration 65970/ 115203 | consumed samples: 16888320 | consumed tokens: 34587279360 | elapsed time per iteration (s): 0.43 | learning rate: 9.083E-05 | global batch size: 256 | lm loss: 2.276431E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.835 | TFLOPs: 31.21 | 7: iteration 65980/ 115203 | consumed samples: 16890880 | consumed tokens: 34592522240 | elapsed time per iteration (s): 0.42 | learning rate: 9.081E-05 | global batch size: 256 | lm loss: 2.265035E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.848 | TFLOPs: 31.68 | 7: iteration 65990/ 115203 | consumed samples: 16893440 | consumed tokens: 34597765120 | elapsed time per iteration (s): 0.42 | learning rate: 9.078E-05 | global batch size: 256 | lm loss: 2.270255E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.792 | TFLOPs: 31.63 | 0: [2022-11-28 20:53:53,382] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=0, lr=[9.075821569240965e-05, 9.075821569240965e-05, 9.075821569240965e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 66000/ 115203 | consumed samples: 16896000 | consumed tokens: 34603008000 | elapsed time per iteration (s): 0.44 | learning rate: 9.076E-05 | global batch size: 256 | lm loss: 2.272593E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.460 | TFLOPs: 30.40 | 0: steps: 66000 loss: 2.2998 iter time (s): 0.432 samples/sec: 592.397 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 66000 | lm loss value: 2.216428E+00 | lm loss PPL: 9.174497E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 66000 to checkpoints_221m 0: [2022-11-28 20:53:53,567] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step66000 is begin to save! 0: [2022-11-28 20:53:53,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_01-model_00-model_states.pt... 0: [2022-11-28 20:53:53,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_01-model_00-model_states.pt. 0: [2022-11-28 20:53:53,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_03-model_00-model_states.pt... 0: [2022-11-28 20:53:53,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_03-model_00-model_states.pt. 0: [2022-11-28 20:53:53,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_04-model_00-model_states.pt... 0: [2022-11-28 20:53:53,729] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_04-model_00-model_states.pt. 0: [2022-11-28 20:53:53,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_05-model_00-model_states.pt... 0: [2022-11-28 20:53:53,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_05-model_00-model_states.pt. 0: [2022-11-28 20:53:53,755] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_06-model_00-model_states.pt... 0: [2022-11-28 20:53:53,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_06-model_00-model_states.pt. 0: [2022-11-28 20:53:53,779] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_07-model_00-model_states.pt... 0: [2022-11-28 20:53:53,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_07-model_00-model_states.pt. 0: [2022-11-28 20:53:53,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_08-model_00-model_states.pt... 0: [2022-11-28 20:53:53,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_08-model_00-model_states.pt. 0: [2022-11-28 20:53:53,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_09-model_00-model_states.pt... 0: [2022-11-28 20:53:53,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_09-model_00-model_states.pt. 0: [2022-11-28 20:53:53,853] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_10-model_00-model_states.pt... 0: [2022-11-28 20:53:53,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_10-model_00-model_states.pt. 0: [2022-11-28 20:53:53,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_11-model_00-model_states.pt... 0: [2022-11-28 20:53:53,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_11-model_00-model_states.pt. 0: [2022-11-28 20:53:53,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_12-model_00-model_states.pt... 0: [2022-11-28 20:53:53,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_12-model_00-model_states.pt. 0: [2022-11-28 20:53:53,927] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_13-model_00-model_states.pt... 0: [2022-11-28 20:53:53,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_13-model_00-model_states.pt. 0: [2022-11-28 20:53:53,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_14-model_00-model_states.pt... 0: [2022-11-28 20:53:53,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_14-model_00-model_states.pt. 0: [2022-11-28 20:53:53,975] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_15-model_00-model_states.pt... 0: [2022-11-28 20:53:54,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_15-model_00-model_states.pt. 0: [2022-11-28 20:53:54,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_16-model_00-model_states.pt... 0: [2022-11-28 20:53:54,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_16-model_00-model_states.pt. 0: [2022-11-28 20:53:54,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_17-model_00-model_states.pt... 0: [2022-11-28 20:53:54,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_17-model_00-model_states.pt. 0: [2022-11-28 20:53:54,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_18-model_00-model_states.pt... 0: [2022-11-28 20:53:54,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_18-model_00-model_states.pt. 0: [2022-11-28 20:53:54,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_19-model_00-model_states.pt... 0: [2022-11-28 20:53:54,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_19-model_00-model_states.pt. 0: [2022-11-28 20:53:54,095] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_20-model_00-model_states.pt... 0: [2022-11-28 20:53:54,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_20-model_00-model_states.pt. 0: [2022-11-28 20:53:54,119] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/layer_22-model_00-model_states.pt... 0: [2022-11-28 20:53:54,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/layer_22-model_00-model_states.pt. 0: [2022-11-28 20:53:54,124] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step66000/mp_rank_00_model_states.pt 0: [2022-11-28 20:53:54,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/mp_rank_00_model_states.pt... 0: [2022-11-28 20:53:54,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/mp_rank_00_model_states.pt. 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 3: [2022-11-28 20:53:54,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 20:53:54,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,192] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,199] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,199] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 2: [2022-11-28 20:53:54,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 20:53:54,203] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 20:53:54,203] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,203] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 5: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 20:53:54,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 20:53:54,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 20:53:54,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 20:53:54,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 1: [2022-11-28 20:53:54,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 7: [2022-11-28 20:53:54,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 20:53:54,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 20:53:54,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 20:53:54,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 20:53:54,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,230] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,230] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,231] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,231] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,212] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,212] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 20:53:54,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 3: [2022-11-28 20:53:54,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,238] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,238] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 4: [2022-11-28 20:53:54,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 20:53:54,240] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 20:53:54,240] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 6: [2022-11-28 20:53:54,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: [2022-11-28 20:53:54,262] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step66000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 20:53:54,262] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step66000 is ready now! 0: successfully saved checkpoint at iteration 66000 to checkpoints_221m 7: time (ms) | save-checkpoint: 722.56 7: iteration 66010/ 115203 | consumed samples: 16898560 | consumed tokens: 34608250880 | elapsed time per iteration (s): 0.53 | learning rate: 9.073E-05 | global batch size: 256 | lm loss: 2.283302E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 486.529 | TFLOPs: 25.53 | 7: iteration 66020/ 115203 | consumed samples: 16901120 | consumed tokens: 34613493760 | elapsed time per iteration (s): 0.43 | learning rate: 9.071E-05 | global batch size: 256 | lm loss: 2.251657E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.626 | TFLOPs: 31.20 | 7: iteration 66030/ 115203 | consumed samples: 16903680 | consumed tokens: 34618736640 | elapsed time per iteration (s): 0.43 | learning rate: 9.069E-05 | global batch size: 256 | lm loss: 2.275138E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.002 | TFLOPs: 30.90 | 7: iteration 66040/ 115203 | consumed samples: 16906240 | consumed tokens: 34623979520 | elapsed time per iteration (s): 0.45 | learning rate: 9.066E-05 | global batch size: 256 | lm loss: 2.295124E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.104 | TFLOPs: 29.55 | 7: iteration 66050/ 115203 | consumed samples: 16908800 | consumed tokens: 34629222400 | elapsed time per iteration (s): 0.43 | learning rate: 9.064E-05 | global batch size: 256 | lm loss: 2.302996E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.673 | TFLOPs: 31.20 | 7: iteration 66060/ 115203 | consumed samples: 16911360 | consumed tokens: 34634465280 | elapsed time per iteration (s): 0.44 | learning rate: 9.061E-05 | global batch size: 256 | lm loss: 2.261996E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.507 | TFLOPs: 30.51 | 7: iteration 66070/ 115203 | consumed samples: 16913920 | consumed tokens: 34639708160 | elapsed time per iteration (s): 0.45 | learning rate: 9.059E-05 | global batch size: 256 | lm loss: 2.248835E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.530 | TFLOPs: 29.78 | 7: iteration 66080/ 115203 | consumed samples: 16916480 | consumed tokens: 34644951040 | elapsed time per iteration (s): 0.44 | learning rate: 9.056E-05 | global batch size: 256 | lm loss: 2.302764E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.325 | TFLOPs: 30.50 | 7: iteration 66090/ 115203 | consumed samples: 16919040 | consumed tokens: 34650193920 | elapsed time per iteration (s): 0.43 | learning rate: 9.054E-05 | global batch size: 256 | lm loss: 2.266465E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.827 | TFLOPs: 30.95 | 7: iteration 66100/ 115203 | consumed samples: 16921600 | consumed tokens: 34655436800 | elapsed time per iteration (s): 0.43 | learning rate: 9.052E-05 | global batch size: 256 | lm loss: 2.280910E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.324 | TFLOPs: 31.29 | 7: iteration 66110/ 115203 | consumed samples: 16924160 | consumed tokens: 34660679680 | elapsed time per iteration (s): 0.44 | learning rate: 9.049E-05 | global batch size: 256 | lm loss: 2.281326E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.629 | TFLOPs: 30.78 | 7: iteration 66120/ 115203 | consumed samples: 16926720 | consumed tokens: 34665922560 | elapsed time per iteration (s): 0.44 | learning rate: 9.047E-05 | global batch size: 256 | lm loss: 2.249837E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.057 | TFLOPs: 30.43 | 7: iteration 66130/ 115203 | consumed samples: 16929280 | consumed tokens: 34671165440 | elapsed time per iteration (s): 0.43 | learning rate: 9.044E-05 | global batch size: 256 | lm loss: 2.262649E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.359 | TFLOPs: 31.34 | 7: iteration 66140/ 115203 | consumed samples: 16931840 | consumed tokens: 34676408320 | elapsed time per iteration (s): 0.43 | learning rate: 9.042E-05 | global batch size: 256 | lm loss: 2.266966E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.720 | TFLOPs: 31.15 | 7: iteration 66150/ 115203 | consumed samples: 16934400 | consumed tokens: 34681651200 | elapsed time per iteration (s): 0.44 | learning rate: 9.040E-05 | global batch size: 256 | lm loss: 2.254728E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.899 | TFLOPs: 30.79 | 7: iteration 66160/ 115203 | consumed samples: 16936960 | consumed tokens: 34686894080 | elapsed time per iteration (s): 0.44 | learning rate: 9.037E-05 | global batch size: 256 | lm loss: 2.281874E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.341 | TFLOPs: 30.45 | 7: iteration 66170/ 115203 | consumed samples: 16939520 | consumed tokens: 34692136960 | elapsed time per iteration (s): 0.43 | learning rate: 9.035E-05 | global batch size: 256 | lm loss: 2.246589E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.104 | TFLOPs: 30.96 | 7: iteration 66180/ 115203 | consumed samples: 16942080 | consumed tokens: 34697379840 | elapsed time per iteration (s): 0.43 | learning rate: 9.032E-05 | global batch size: 256 | lm loss: 2.257034E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.911 | TFLOPs: 30.95 | 7: iteration 66190/ 115203 | consumed samples: 16944640 | consumed tokens: 34702622720 | elapsed time per iteration (s): 0.44 | learning rate: 9.030E-05 | global batch size: 256 | lm loss: 2.249414E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.464 | TFLOPs: 30.61 | 7: iteration 66200/ 115203 | consumed samples: 16947200 | consumed tokens: 34707865600 | elapsed time per iteration (s): 0.43 | learning rate: 9.027E-05 | global batch size: 256 | lm loss: 2.284731E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.486 | TFLOPs: 31.45 | 7: iteration 66210/ 115203 | consumed samples: 16949760 | consumed tokens: 34713108480 | elapsed time per iteration (s): 0.43 | learning rate: 9.025E-05 | global batch size: 256 | lm loss: 2.259895E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.371 | TFLOPs: 31.40 | 7: iteration 66220/ 115203 | consumed samples: 16952320 | consumed tokens: 34718351360 | elapsed time per iteration (s): 0.43 | learning rate: 9.023E-05 | global batch size: 256 | lm loss: 2.286509E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.810 | TFLOPs: 31.21 | 7: iteration 66230/ 115203 | consumed samples: 16954880 | consumed tokens: 34723594240 | elapsed time per iteration (s): 0.43 | learning rate: 9.020E-05 | global batch size: 256 | lm loss: 2.276142E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.051 | TFLOPs: 31.54 | 7: iteration 66240/ 115203 | consumed samples: 16957440 | consumed tokens: 34728837120 | elapsed time per iteration (s): 0.44 | learning rate: 9.018E-05 | global batch size: 256 | lm loss: 2.306095E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.510 | TFLOPs: 30.51 | 7: iteration 66250/ 115203 | consumed samples: 16960000 | consumed tokens: 34734080000 | elapsed time per iteration (s): 0.42 | learning rate: 9.015E-05 | global batch size: 256 | lm loss: 2.258930E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.035 | TFLOPs: 31.75 | 7: iteration 66260/ 115203 | consumed samples: 16962560 | consumed tokens: 34739322880 | elapsed time per iteration (s): 0.44 | learning rate: 9.013E-05 | global batch size: 256 | lm loss: 2.277796E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.267 | TFLOPs: 30.71 | 7: iteration 66270/ 115203 | consumed samples: 16965120 | consumed tokens: 34744565760 | elapsed time per iteration (s): 0.43 | learning rate: 9.010E-05 | global batch size: 256 | lm loss: 2.260437E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.012 | TFLOPs: 31.06 | 7: iteration 66280/ 115203 | consumed samples: 16967680 | consumed tokens: 34749808640 | elapsed time per iteration (s): 0.43 | learning rate: 9.008E-05 | global batch size: 256 | lm loss: 2.240703E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.662 | TFLOPs: 31.41 | 7: iteration 66290/ 115203 | consumed samples: 16970240 | consumed tokens: 34755051520 | elapsed time per iteration (s): 0.43 | learning rate: 9.006E-05 | global batch size: 256 | lm loss: 2.279080E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.062 | TFLOPs: 31.48 | 7: iteration 66300/ 115203 | consumed samples: 16972800 | consumed tokens: 34760294400 | elapsed time per iteration (s): 0.43 | learning rate: 9.003E-05 | global batch size: 256 | lm loss: 2.263176E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.142 | TFLOPs: 31.33 | 7: iteration 66310/ 115203 | consumed samples: 16975360 | consumed tokens: 34765537280 | elapsed time per iteration (s): 0.43 | learning rate: 9.001E-05 | global batch size: 256 | lm loss: 2.246516E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.727 | TFLOPs: 31.10 | 7: iteration 66320/ 115203 | consumed samples: 16977920 | consumed tokens: 34770780160 | elapsed time per iteration (s): 0.44 | learning rate: 8.998E-05 | global batch size: 256 | lm loss: 2.262461E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.988 | TFLOPs: 30.64 | 7: iteration 66330/ 115203 | consumed samples: 16980480 | consumed tokens: 34776023040 | elapsed time per iteration (s): 0.44 | learning rate: 8.996E-05 | global batch size: 256 | lm loss: 2.291217E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.344 | TFLOPs: 30.76 | 7: iteration 66340/ 115203 | consumed samples: 16983040 | consumed tokens: 34781265920 | elapsed time per iteration (s): 0.42 | learning rate: 8.994E-05 | global batch size: 256 | lm loss: 2.253901E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.272 | TFLOPs: 31.81 | 7: iteration 66350/ 115203 | consumed samples: 16985600 | consumed tokens: 34786508800 | elapsed time per iteration (s): 0.42 | learning rate: 8.991E-05 | global batch size: 256 | lm loss: 2.289648E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.587 | TFLOPs: 31.72 | 7: iteration 66360/ 115203 | consumed samples: 16988160 | consumed tokens: 34791751680 | elapsed time per iteration (s): 0.43 | learning rate: 8.989E-05 | global batch size: 256 | lm loss: 2.281809E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.030 | TFLOPs: 31.27 | 7: iteration 66370/ 115203 | consumed samples: 16990720 | consumed tokens: 34796994560 | elapsed time per iteration (s): 0.43 | learning rate: 8.986E-05 | global batch size: 256 | lm loss: 2.272130E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.945 | TFLOPs: 31.01 | 7: iteration 66380/ 115203 | consumed samples: 16993280 | consumed tokens: 34802237440 | elapsed time per iteration (s): 0.44 | learning rate: 8.984E-05 | global batch size: 256 | lm loss: 2.304738E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.188 | TFLOPs: 30.86 | 7: iteration 66390/ 115203 | consumed samples: 16995840 | consumed tokens: 34807480320 | elapsed time per iteration (s): 0.42 | learning rate: 8.981E-05 | global batch size: 256 | lm loss: 2.276131E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.801 | TFLOPs: 31.84 | 7: iteration 66400/ 115203 | consumed samples: 16998400 | consumed tokens: 34812723200 | elapsed time per iteration (s): 0.44 | learning rate: 8.979E-05 | global batch size: 256 | lm loss: 2.258207E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.693 | TFLOPs: 30.47 | 7: iteration 66410/ 115203 | consumed samples: 17000960 | consumed tokens: 34817966080 | elapsed time per iteration (s): 0.44 | learning rate: 8.977E-05 | global batch size: 256 | lm loss: 2.261231E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.761 | TFLOPs: 30.63 | 7: iteration 66420/ 115203 | consumed samples: 17003520 | consumed tokens: 34823208960 | elapsed time per iteration (s): 0.44 | learning rate: 8.974E-05 | global batch size: 256 | lm loss: 2.249803E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.164 | TFLOPs: 30.39 | 7: iteration 66430/ 115203 | consumed samples: 17006080 | consumed tokens: 34828451840 | elapsed time per iteration (s): 0.43 | learning rate: 8.972E-05 | global batch size: 256 | lm loss: 2.294263E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.831 | TFLOPs: 31.37 | 7: iteration 66440/ 115203 | consumed samples: 17008640 | consumed tokens: 34833694720 | elapsed time per iteration (s): 0.45 | learning rate: 8.969E-05 | global batch size: 256 | lm loss: 2.294588E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.895 | TFLOPs: 29.90 | 7: iteration 66450/ 115203 | consumed samples: 17011200 | consumed tokens: 34838937600 | elapsed time per iteration (s): 0.46 | learning rate: 8.967E-05 | global batch size: 256 | lm loss: 2.252557E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.707 | TFLOPs: 29.31 | 7: iteration 66460/ 115203 | consumed samples: 17013760 | consumed tokens: 34844180480 | elapsed time per iteration (s): 0.44 | learning rate: 8.965E-05 | global batch size: 256 | lm loss: 2.281764E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.143 | TFLOPs: 30.28 | 7: iteration 66470/ 115203 | consumed samples: 17016320 | consumed tokens: 34849423360 | elapsed time per iteration (s): 0.43 | learning rate: 8.962E-05 | global batch size: 256 | lm loss: 2.256422E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.351 | TFLOPs: 31.18 | 7: iteration 66480/ 115203 | consumed samples: 17018880 | consumed tokens: 34854666240 | elapsed time per iteration (s): 0.43 | learning rate: 8.960E-05 | global batch size: 256 | lm loss: 2.264682E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.815 | TFLOPs: 31.31 | 7: iteration 66490/ 115203 | consumed samples: 17021440 | consumed tokens: 34859909120 | elapsed time per iteration (s): 0.44 | learning rate: 8.957E-05 | global batch size: 256 | lm loss: 2.251878E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.788 | TFLOPs: 30.84 | 7: iteration 66500/ 115203 | consumed samples: 17024000 | consumed tokens: 34865152000 | elapsed time per iteration (s): 0.43 | learning rate: 8.955E-05 | global batch size: 256 | lm loss: 2.303631E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.178 | TFLOPs: 30.97 | 7: iteration 66510/ 115203 | consumed samples: 17026560 | consumed tokens: 34870394880 | elapsed time per iteration (s): 0.44 | learning rate: 8.953E-05 | global batch size: 256 | lm loss: 2.277144E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.486 | TFLOPs: 30.82 | 7: iteration 66520/ 115203 | consumed samples: 17029120 | consumed tokens: 34875637760 | elapsed time per iteration (s): 0.43 | learning rate: 8.950E-05 | global batch size: 256 | lm loss: 2.295266E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.596 | TFLOPs: 31.56 | 7: iteration 66530/ 115203 | consumed samples: 17031680 | consumed tokens: 34880880640 | elapsed time per iteration (s): 0.42 | learning rate: 8.948E-05 | global batch size: 256 | lm loss: 2.261934E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.351 | TFLOPs: 31.71 | 7: iteration 66540/ 115203 | consumed samples: 17034240 | consumed tokens: 34886123520 | elapsed time per iteration (s): 0.43 | learning rate: 8.945E-05 | global batch size: 256 | lm loss: 2.288783E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.510 | TFLOPs: 31.19 | 7: iteration 66550/ 115203 | consumed samples: 17036800 | consumed tokens: 34891366400 | elapsed time per iteration (s): 0.43 | learning rate: 8.943E-05 | global batch size: 256 | lm loss: 2.283063E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.699 | TFLOPs: 31.15 | 7: iteration 66560/ 115203 | consumed samples: 17039360 | consumed tokens: 34896609280 | elapsed time per iteration (s): 0.42 | learning rate: 8.940E-05 | global batch size: 256 | lm loss: 2.304581E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.844 | TFLOPs: 31.63 | 7: iteration 66570/ 115203 | consumed samples: 17041920 | consumed tokens: 34901852160 | elapsed time per iteration (s): 0.44 | learning rate: 8.938E-05 | global batch size: 256 | lm loss: 2.249748E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.506 | TFLOPs: 30.62 | 7: iteration 66580/ 115203 | consumed samples: 17044480 | consumed tokens: 34907095040 | elapsed time per iteration (s): 0.43 | learning rate: 8.936E-05 | global batch size: 256 | lm loss: 2.259434E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.326 | TFLOPs: 31.45 | 7: iteration 66590/ 115203 | consumed samples: 17047040 | consumed tokens: 34912337920 | elapsed time per iteration (s): 0.43 | learning rate: 8.933E-05 | global batch size: 256 | lm loss: 2.276128E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.987 | TFLOPs: 31.38 | 7: iteration 66600/ 115203 | consumed samples: 17049600 | consumed tokens: 34917580800 | elapsed time per iteration (s): 0.43 | learning rate: 8.931E-05 | global batch size: 256 | lm loss: 2.292725E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.201 | TFLOPs: 31.23 | 7: iteration 66610/ 115203 | consumed samples: 17052160 | consumed tokens: 34922823680 | elapsed time per iteration (s): 0.45 | learning rate: 8.928E-05 | global batch size: 256 | lm loss: 2.258157E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.716 | TFLOPs: 29.89 | 7: iteration 66620/ 115203 | consumed samples: 17054720 | consumed tokens: 34928066560 | elapsed time per iteration (s): 0.43 | learning rate: 8.926E-05 | global batch size: 256 | lm loss: 2.294259E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.732 | TFLOPs: 30.89 | 7: iteration 66630/ 115203 | consumed samples: 17057280 | consumed tokens: 34933309440 | elapsed time per iteration (s): 0.45 | learning rate: 8.924E-05 | global batch size: 256 | lm loss: 2.280044E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.290 | TFLOPs: 30.08 | 7: iteration 66640/ 115203 | consumed samples: 17059840 | consumed tokens: 34938552320 | elapsed time per iteration (s): 0.43 | learning rate: 8.921E-05 | global batch size: 256 | lm loss: 2.255379E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.846 | TFLOPs: 30.95 | 7: iteration 66650/ 115203 | consumed samples: 17062400 | consumed tokens: 34943795200 | elapsed time per iteration (s): 0.45 | learning rate: 8.919E-05 | global batch size: 256 | lm loss: 2.268728E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.820 | TFLOPs: 30.00 | 7: iteration 66660/ 115203 | consumed samples: 17064960 | consumed tokens: 34949038080 | elapsed time per iteration (s): 0.43 | learning rate: 8.916E-05 | global batch size: 256 | lm loss: 2.244207E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.547 | TFLOPs: 31.35 | 7: iteration 66670/ 115203 | consumed samples: 17067520 | consumed tokens: 34954280960 | elapsed time per iteration (s): 0.43 | learning rate: 8.914E-05 | global batch size: 256 | lm loss: 2.251552E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.825 | TFLOPs: 31.31 | 7: iteration 66680/ 115203 | consumed samples: 17070080 | consumed tokens: 34959523840 | elapsed time per iteration (s): 0.44 | learning rate: 8.911E-05 | global batch size: 256 | lm loss: 2.314734E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.206 | TFLOPs: 30.65 | 7: iteration 66690/ 115203 | consumed samples: 17072640 | consumed tokens: 34964766720 | elapsed time per iteration (s): 0.43 | learning rate: 8.909E-05 | global batch size: 256 | lm loss: 2.271468E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.499 | TFLOPs: 30.93 | 7: iteration 66700/ 115203 | consumed samples: 17075200 | consumed tokens: 34970009600 | elapsed time per iteration (s): 0.45 | learning rate: 8.907E-05 | global batch size: 256 | lm loss: 2.264022E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.136 | TFLOPs: 29.70 | 7: iteration 66710/ 115203 | consumed samples: 17077760 | consumed tokens: 34975252480 | elapsed time per iteration (s): 0.44 | learning rate: 8.904E-05 | global batch size: 256 | lm loss: 2.268518E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.248 | TFLOPs: 30.55 | 7: iteration 66720/ 115203 | consumed samples: 17080320 | consumed tokens: 34980495360 | elapsed time per iteration (s): 0.44 | learning rate: 8.902E-05 | global batch size: 256 | lm loss: 2.263056E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.933 | TFLOPs: 30.69 | 7: iteration 66730/ 115203 | consumed samples: 17082880 | consumed tokens: 34985738240 | elapsed time per iteration (s): 0.43 | learning rate: 8.899E-05 | global batch size: 256 | lm loss: 2.282750E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.443 | TFLOPs: 30.98 | 7: iteration 66740/ 115203 | consumed samples: 17085440 | consumed tokens: 34990981120 | elapsed time per iteration (s): 0.43 | learning rate: 8.897E-05 | global batch size: 256 | lm loss: 2.277150E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.093 | TFLOPs: 31.28 | 7: iteration 66750/ 115203 | consumed samples: 17088000 | consumed tokens: 34996224000 | elapsed time per iteration (s): 0.42 | learning rate: 8.895E-05 | global batch size: 256 | lm loss: 2.278696E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.374 | TFLOPs: 31.82 | 7: iteration 66760/ 115203 | consumed samples: 17090560 | consumed tokens: 35001466880 | elapsed time per iteration (s): 0.44 | learning rate: 8.892E-05 | global batch size: 256 | lm loss: 2.227356E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.751 | TFLOPs: 30.79 | 7: iteration 66770/ 115203 | consumed samples: 17093120 | consumed tokens: 35006709760 | elapsed time per iteration (s): 0.43 | learning rate: 8.890E-05 | global batch size: 256 | lm loss: 2.266157E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.276 | TFLOPs: 31.18 | 7: iteration 66780/ 115203 | consumed samples: 17095680 | consumed tokens: 35011952640 | elapsed time per iteration (s): 0.43 | learning rate: 8.887E-05 | global batch size: 256 | lm loss: 2.267598E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.662 | TFLOPs: 30.94 | 7: iteration 66790/ 115203 | consumed samples: 17098240 | consumed tokens: 35017195520 | elapsed time per iteration (s): 0.43 | learning rate: 8.885E-05 | global batch size: 256 | lm loss: 2.255326E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.448 | TFLOPs: 31.14 | 7: iteration 66800/ 115203 | consumed samples: 17100800 | consumed tokens: 35022438400 | elapsed time per iteration (s): 0.44 | learning rate: 8.883E-05 | global batch size: 256 | lm loss: 2.238022E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.310 | TFLOPs: 30.82 | 7: iteration 66810/ 115203 | consumed samples: 17103360 | consumed tokens: 35027681280 | elapsed time per iteration (s): 0.42 | learning rate: 8.880E-05 | global batch size: 256 | lm loss: 2.265391E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.723 | TFLOPs: 31.83 | 7: iteration 66820/ 115203 | consumed samples: 17105920 | consumed tokens: 35032924160 | elapsed time per iteration (s): 0.42 | learning rate: 8.878E-05 | global batch size: 256 | lm loss: 2.267060E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.424 | TFLOPs: 31.87 | 7: iteration 66830/ 115203 | consumed samples: 17108480 | consumed tokens: 35038167040 | elapsed time per iteration (s): 0.43 | learning rate: 8.875E-05 | global batch size: 256 | lm loss: 2.286267E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.638 | TFLOPs: 31.46 | 7: iteration 66840/ 115203 | consumed samples: 17111040 | consumed tokens: 35043409920 | elapsed time per iteration (s): 0.43 | learning rate: 8.873E-05 | global batch size: 256 | lm loss: 2.270639E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.780 | TFLOPs: 31.15 | 7: iteration 66850/ 115203 | consumed samples: 17113600 | consumed tokens: 35048652800 | elapsed time per iteration (s): 0.43 | learning rate: 8.871E-05 | global batch size: 256 | lm loss: 2.261518E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.168 | TFLOPs: 31.49 | 7: iteration 66860/ 115203 | consumed samples: 17116160 | consumed tokens: 35053895680 | elapsed time per iteration (s): 0.44 | learning rate: 8.868E-05 | global batch size: 256 | lm loss: 2.256279E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.020 | TFLOPs: 30.43 | 7: iteration 66870/ 115203 | consumed samples: 17118720 | consumed tokens: 35059138560 | elapsed time per iteration (s): 0.43 | learning rate: 8.866E-05 | global batch size: 256 | lm loss: 2.274012E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.220 | TFLOPs: 31.55 | 7: iteration 66880/ 115203 | consumed samples: 17121280 | consumed tokens: 35064381440 | elapsed time per iteration (s): 0.43 | learning rate: 8.863E-05 | global batch size: 256 | lm loss: 2.252169E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.877 | TFLOPs: 31.00 | 7: iteration 66890/ 115203 | consumed samples: 17123840 | consumed tokens: 35069624320 | elapsed time per iteration (s): 0.43 | learning rate: 8.861E-05 | global batch size: 256 | lm loss: 2.255574E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.510 | TFLOPs: 31.19 | 7: iteration 66900/ 115203 | consumed samples: 17126400 | consumed tokens: 35074867200 | elapsed time per iteration (s): 0.43 | learning rate: 8.858E-05 | global batch size: 256 | lm loss: 2.319606E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.644 | TFLOPs: 31.51 | 7: iteration 66910/ 115203 | consumed samples: 17128960 | consumed tokens: 35080110080 | elapsed time per iteration (s): 0.44 | learning rate: 8.856E-05 | global batch size: 256 | lm loss: 2.282248E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.018 | TFLOPs: 30.54 | 7: iteration 66920/ 115203 | consumed samples: 17131520 | consumed tokens: 35085352960 | elapsed time per iteration (s): 0.43 | learning rate: 8.854E-05 | global batch size: 256 | lm loss: 2.279635E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.112 | TFLOPs: 31.07 | 7: iteration 66930/ 115203 | consumed samples: 17134080 | consumed tokens: 35090595840 | elapsed time per iteration (s): 0.44 | learning rate: 8.851E-05 | global batch size: 256 | lm loss: 2.264905E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.258 | TFLOPs: 30.66 | 7: iteration 66940/ 115203 | consumed samples: 17136640 | consumed tokens: 35095838720 | elapsed time per iteration (s): 0.43 | learning rate: 8.849E-05 | global batch size: 256 | lm loss: 2.277087E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.709 | TFLOPs: 31.31 | 7: iteration 66950/ 115203 | consumed samples: 17139200 | consumed tokens: 35101081600 | elapsed time per iteration (s): 0.43 | learning rate: 8.846E-05 | global batch size: 256 | lm loss: 2.321711E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.949 | TFLOPs: 31.32 | 7: iteration 66960/ 115203 | consumed samples: 17141760 | consumed tokens: 35106324480 | elapsed time per iteration (s): 0.43 | learning rate: 8.844E-05 | global batch size: 256 | lm loss: 2.284883E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.433 | TFLOPs: 31.45 | 7: iteration 66970/ 115203 | consumed samples: 17144320 | consumed tokens: 35111567360 | elapsed time per iteration (s): 0.43 | learning rate: 8.842E-05 | global batch size: 256 | lm loss: 2.254145E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.174 | TFLOPs: 31.18 | 7: iteration 66980/ 115203 | consumed samples: 17146880 | consumed tokens: 35116810240 | elapsed time per iteration (s): 0.43 | learning rate: 8.839E-05 | global batch size: 256 | lm loss: 2.285809E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.390 | TFLOPs: 30.98 | 7: iteration 66990/ 115203 | consumed samples: 17149440 | consumed tokens: 35122053120 | elapsed time per iteration (s): 0.43 | learning rate: 8.837E-05 | global batch size: 256 | lm loss: 2.316345E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.537 | TFLOPs: 31.35 | 7: iteration 67000/ 115203 | consumed samples: 17152000 | consumed tokens: 35127296000 | elapsed time per iteration (s): 0.44 | learning rate: 8.834E-05 | global batch size: 256 | lm loss: 2.282603E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.167 | TFLOPs: 30.81 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 67000 | lm loss value: 2.211819E+00 | lm loss PPL: 9.132317E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 67000 to checkpoints_221m 0: [2022-11-28 21:01:08,068] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step67000 is begin to save! 0: [2022-11-28 21:01:08,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:01:08,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:01:08,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:01:08,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:01:08,200] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:01:08,223] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:01:08,224] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:01:08,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:01:08,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:01:08,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:01:08,270] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:01:08,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:01:08,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:01:08,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:01:08,318] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:01:08,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:01:08,341] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:01:08,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:01:08,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:01:08,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:01:08,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:01:08,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:01:08,411] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:01:08,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:01:08,435] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:01:08,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:01:08,458] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:01:08,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:01:08,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:01:08,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:01:08,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:01:08,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:01:08,527] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:01:08,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:01:08,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:01:08,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:01:08,575] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:01:08,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:01:08,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:01:08,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:01:08,603] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step67000/mp_rank_00_model_states.pt 0: [2022-11-28 21:01:08,603] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:01:08,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:01:08,622] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:01:08,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 7: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 6: [2022-11-28 21:01:08,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:01:08,684] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:01:08,684] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 2: [2022-11-28 21:01:08,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:01:08,685] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:01:08,685] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 1: [2022-11-28 21:01:08,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:01:08,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:01:08,689] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,686] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,693] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,693] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 4: [2022-11-28 21:01:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:01:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:01:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:01:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 5: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:01:08,705] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:01:08,705] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: [2022-11-28 21:01:08,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step67000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:01:08,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step67000 is ready now! 0: successfully saved checkpoint at iteration 67000 to checkpoints_221m 7: time (ms) | save-checkpoint: 672.54 7: iteration 67010/ 115203 | consumed samples: 17154560 | consumed tokens: 35132538880 | elapsed time per iteration (s): 0.50 | learning rate: 8.832E-05 | global batch size: 256 | lm loss: 2.273455E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 512.561 | TFLOPs: 26.89 | 7: iteration 67020/ 115203 | consumed samples: 17157120 | consumed tokens: 35137781760 | elapsed time per iteration (s): 0.43 | learning rate: 8.830E-05 | global batch size: 256 | lm loss: 2.260937E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.501 | TFLOPs: 31.19 | 7: iteration 67030/ 115203 | consumed samples: 17159680 | consumed tokens: 35143024640 | elapsed time per iteration (s): 0.43 | learning rate: 8.827E-05 | global batch size: 256 | lm loss: 2.284044E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.451 | TFLOPs: 31.14 | 7: iteration 67040/ 115203 | consumed samples: 17162240 | consumed tokens: 35148267520 | elapsed time per iteration (s): 0.43 | learning rate: 8.825E-05 | global batch size: 256 | lm loss: 2.279620E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.068 | TFLOPs: 31.01 | 7: iteration 67050/ 115203 | consumed samples: 17164800 | consumed tokens: 35153510400 | elapsed time per iteration (s): 0.44 | learning rate: 8.822E-05 | global batch size: 256 | lm loss: 2.245705E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.442 | TFLOPs: 30.45 | 7: iteration 67060/ 115203 | consumed samples: 17167360 | consumed tokens: 35158753280 | elapsed time per iteration (s): 0.43 | learning rate: 8.820E-05 | global batch size: 256 | lm loss: 2.270575E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.947 | TFLOPs: 31.43 | 7: iteration 67070/ 115203 | consumed samples: 17169920 | consumed tokens: 35163996160 | elapsed time per iteration (s): 0.43 | learning rate: 8.818E-05 | global batch size: 256 | lm loss: 2.264184E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.780 | TFLOPs: 31.05 | 7: iteration 67080/ 115203 | consumed samples: 17172480 | consumed tokens: 35169239040 | elapsed time per iteration (s): 0.44 | learning rate: 8.815E-05 | global batch size: 256 | lm loss: 2.264835E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.539 | TFLOPs: 30.56 | 7: iteration 67090/ 115203 | consumed samples: 17175040 | consumed tokens: 35174481920 | elapsed time per iteration (s): 0.42 | learning rate: 8.813E-05 | global batch size: 256 | lm loss: 2.273213E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.710 | TFLOPs: 31.68 | 7: iteration 67100/ 115203 | consumed samples: 17177600 | consumed tokens: 35179724800 | elapsed time per iteration (s): 0.43 | learning rate: 8.810E-05 | global batch size: 256 | lm loss: 2.303845E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.808 | TFLOPs: 31.42 | 7: iteration 67110/ 115203 | consumed samples: 17180160 | consumed tokens: 35184967680 | elapsed time per iteration (s): 0.43 | learning rate: 8.808E-05 | global batch size: 256 | lm loss: 2.231138E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.247 | TFLOPs: 31.39 | 7: iteration 67120/ 115203 | consumed samples: 17182720 | consumed tokens: 35190210560 | elapsed time per iteration (s): 0.43 | learning rate: 8.806E-05 | global batch size: 256 | lm loss: 2.281667E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.978 | TFLOPs: 31.17 | 7: iteration 67130/ 115203 | consumed samples: 17185280 | consumed tokens: 35195453440 | elapsed time per iteration (s): 0.44 | learning rate: 8.803E-05 | global batch size: 256 | lm loss: 2.295067E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.579 | TFLOPs: 30.41 | 7: iteration 67140/ 115203 | consumed samples: 17187840 | consumed tokens: 35200696320 | elapsed time per iteration (s): 0.45 | learning rate: 8.801E-05 | global batch size: 256 | lm loss: 2.252675E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.281 | TFLOPs: 29.71 | 7: iteration 67150/ 115203 | consumed samples: 17190400 | consumed tokens: 35205939200 | elapsed time per iteration (s): 0.43 | learning rate: 8.798E-05 | global batch size: 256 | lm loss: 2.248238E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.142 | TFLOPs: 31.23 | 7: iteration 67160/ 115203 | consumed samples: 17192960 | consumed tokens: 35211182080 | elapsed time per iteration (s): 0.42 | learning rate: 8.796E-05 | global batch size: 256 | lm loss: 2.256870E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.888 | TFLOPs: 31.69 | 7: iteration 67170/ 115203 | consumed samples: 17195520 | consumed tokens: 35216424960 | elapsed time per iteration (s): 0.43 | learning rate: 8.794E-05 | global batch size: 256 | lm loss: 2.281280E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.131 | TFLOPs: 31.49 | 7: iteration 67180/ 115203 | consumed samples: 17198080 | consumed tokens: 35221667840 | elapsed time per iteration (s): 0.43 | learning rate: 8.791E-05 | global batch size: 256 | lm loss: 2.276479E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.312 | TFLOPs: 31.24 | 7: iteration 67190/ 115203 | consumed samples: 17200640 | consumed tokens: 35226910720 | elapsed time per iteration (s): 0.43 | learning rate: 8.789E-05 | global batch size: 256 | lm loss: 2.240209E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.426 | TFLOPs: 30.93 | 7: iteration 67200/ 115203 | consumed samples: 17203200 | consumed tokens: 35232153600 | elapsed time per iteration (s): 0.44 | learning rate: 8.786E-05 | global batch size: 256 | lm loss: 2.292478E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.921 | TFLOPs: 30.58 | 7: iteration 67210/ 115203 | consumed samples: 17205760 | consumed tokens: 35237396480 | elapsed time per iteration (s): 0.43 | learning rate: 8.784E-05 | global batch size: 256 | lm loss: 2.277055E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.961 | TFLOPs: 31.01 | 7: iteration 67220/ 115203 | consumed samples: 17208320 | consumed tokens: 35242639360 | elapsed time per iteration (s): 0.43 | learning rate: 8.782E-05 | global batch size: 256 | lm loss: 2.276415E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.178 | TFLOPs: 31.07 | 7: iteration 67230/ 115203 | consumed samples: 17210880 | consumed tokens: 35247882240 | elapsed time per iteration (s): 0.42 | learning rate: 8.779E-05 | global batch size: 256 | lm loss: 2.307432E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.611 | TFLOPs: 31.78 | 7: iteration 67240/ 115203 | consumed samples: 17213440 | consumed tokens: 35253125120 | elapsed time per iteration (s): 0.43 | learning rate: 8.777E-05 | global batch size: 256 | lm loss: 2.255086E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.991 | TFLOPs: 31.32 | 7: iteration 67250/ 115203 | consumed samples: 17216000 | consumed tokens: 35258368000 | elapsed time per iteration (s): 0.44 | learning rate: 8.774E-05 | global batch size: 256 | lm loss: 2.277586E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.829 | TFLOPs: 30.84 | 7: iteration 67260/ 115203 | consumed samples: 17218560 | consumed tokens: 35263610880 | elapsed time per iteration (s): 0.43 | learning rate: 8.772E-05 | global batch size: 256 | lm loss: 2.281697E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.833 | TFLOPs: 31.26 | 7: iteration 67270/ 115203 | consumed samples: 17221120 | consumed tokens: 35268853760 | elapsed time per iteration (s): 0.43 | learning rate: 8.769E-05 | global batch size: 256 | lm loss: 2.264934E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.479 | TFLOPs: 31.24 | 7: iteration 67280/ 115203 | consumed samples: 17223680 | consumed tokens: 35274096640 | elapsed time per iteration (s): 0.43 | learning rate: 8.767E-05 | global batch size: 256 | lm loss: 2.244702E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.729 | TFLOPs: 31.26 | 7: iteration 67290/ 115203 | consumed samples: 17226240 | consumed tokens: 35279339520 | elapsed time per iteration (s): 0.43 | learning rate: 8.765E-05 | global batch size: 256 | lm loss: 2.269824E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.336 | TFLOPs: 30.92 | 7: iteration 67300/ 115203 | consumed samples: 17228800 | consumed tokens: 35284582400 | elapsed time per iteration (s): 0.43 | learning rate: 8.762E-05 | global batch size: 256 | lm loss: 2.257971E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.688 | TFLOPs: 30.99 | 7: iteration 67310/ 115203 | consumed samples: 17231360 | consumed tokens: 35289825280 | elapsed time per iteration (s): 0.43 | learning rate: 8.760E-05 | global batch size: 256 | lm loss: 2.245211E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.676 | TFLOPs: 31.04 | 7: iteration 67320/ 115203 | consumed samples: 17233920 | consumed tokens: 35295068160 | elapsed time per iteration (s): 0.43 | learning rate: 8.757E-05 | global batch size: 256 | lm loss: 2.302058E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.287 | TFLOPs: 30.97 | 7: iteration 67330/ 115203 | consumed samples: 17236480 | consumed tokens: 35300311040 | elapsed time per iteration (s): 0.44 | learning rate: 8.755E-05 | global batch size: 256 | lm loss: 2.267630E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.313 | TFLOPs: 30.66 | 7: iteration 67340/ 115203 | consumed samples: 17239040 | consumed tokens: 35305553920 | elapsed time per iteration (s): 0.43 | learning rate: 8.753E-05 | global batch size: 256 | lm loss: 2.279184E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.811 | TFLOPs: 30.89 | 7: iteration 67350/ 115203 | consumed samples: 17241600 | consumed tokens: 35310796800 | elapsed time per iteration (s): 0.43 | learning rate: 8.750E-05 | global batch size: 256 | lm loss: 2.271362E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.702 | TFLOPs: 31.15 | 7: iteration 67360/ 115203 | consumed samples: 17244160 | consumed tokens: 35316039680 | elapsed time per iteration (s): 0.43 | learning rate: 8.748E-05 | global batch size: 256 | lm loss: 2.279980E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.660 | TFLOPs: 31.10 | 7: iteration 67370/ 115203 | consumed samples: 17246720 | consumed tokens: 35321282560 | elapsed time per iteration (s): 0.43 | learning rate: 8.745E-05 | global batch size: 256 | lm loss: 2.260265E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.899 | TFLOPs: 31.00 | 7: iteration 67380/ 115203 | consumed samples: 17249280 | consumed tokens: 35326525440 | elapsed time per iteration (s): 0.43 | learning rate: 8.743E-05 | global batch size: 256 | lm loss: 2.270527E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.240 | TFLOPs: 31.60 | 7: iteration 67390/ 115203 | consumed samples: 17251840 | consumed tokens: 35331768320 | elapsed time per iteration (s): 0.43 | learning rate: 8.741E-05 | global batch size: 256 | lm loss: 2.267255E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.975 | TFLOPs: 31.43 | 7: iteration 67400/ 115203 | consumed samples: 17254400 | consumed tokens: 35337011200 | elapsed time per iteration (s): 0.43 | learning rate: 8.738E-05 | global batch size: 256 | lm loss: 2.235098E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.737 | TFLOPs: 30.94 | 7: iteration 67410/ 115203 | consumed samples: 17256960 | consumed tokens: 35342254080 | elapsed time per iteration (s): 0.44 | learning rate: 8.736E-05 | global batch size: 256 | lm loss: 2.270964E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.664 | TFLOPs: 30.78 | 7: iteration 67420/ 115203 | consumed samples: 17259520 | consumed tokens: 35347496960 | elapsed time per iteration (s): 0.43 | learning rate: 8.733E-05 | global batch size: 256 | lm loss: 2.297564E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.444 | TFLOPs: 31.08 | 7: iteration 67430/ 115203 | consumed samples: 17262080 | consumed tokens: 35352739840 | elapsed time per iteration (s): 0.43 | learning rate: 8.731E-05 | global batch size: 256 | lm loss: 2.226221E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.074 | TFLOPs: 31.43 | 7: iteration 67440/ 115203 | consumed samples: 17264640 | consumed tokens: 35357982720 | elapsed time per iteration (s): 0.43 | learning rate: 8.729E-05 | global batch size: 256 | lm loss: 2.275122E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.942 | TFLOPs: 31.32 | 7: iteration 67450/ 115203 | consumed samples: 17267200 | consumed tokens: 35363225600 | elapsed time per iteration (s): 0.43 | learning rate: 8.726E-05 | global batch size: 256 | lm loss: 2.268169E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.774 | TFLOPs: 31.31 | 7: iteration 67460/ 115203 | consumed samples: 17269760 | consumed tokens: 35368468480 | elapsed time per iteration (s): 0.43 | learning rate: 8.724E-05 | global batch size: 256 | lm loss: 2.243006E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.439 | TFLOPs: 31.24 | 7: iteration 67470/ 115203 | consumed samples: 17272320 | consumed tokens: 35373711360 | elapsed time per iteration (s): 0.44 | learning rate: 8.721E-05 | global batch size: 256 | lm loss: 2.272048E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.085 | TFLOPs: 30.70 | 7: iteration 67480/ 115203 | consumed samples: 17274880 | consumed tokens: 35378954240 | elapsed time per iteration (s): 0.43 | learning rate: 8.719E-05 | global batch size: 256 | lm loss: 2.254218E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.213 | TFLOPs: 31.18 | 7: iteration 67490/ 115203 | consumed samples: 17277440 | consumed tokens: 35384197120 | elapsed time per iteration (s): 0.43 | learning rate: 8.717E-05 | global batch size: 256 | lm loss: 2.291778E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.226 | TFLOPs: 31.07 | 7: iteration 67500/ 115203 | consumed samples: 17280000 | consumed tokens: 35389440000 | elapsed time per iteration (s): 0.43 | learning rate: 8.714E-05 | global batch size: 256 | lm loss: 2.274528E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.879 | TFLOPs: 31.26 | 7: iteration 67510/ 115203 | consumed samples: 17282560 | consumed tokens: 35394682880 | elapsed time per iteration (s): 0.42 | learning rate: 8.712E-05 | global batch size: 256 | lm loss: 2.276406E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.672 | TFLOPs: 31.73 | 7: iteration 67520/ 115203 | consumed samples: 17285120 | consumed tokens: 35399925760 | elapsed time per iteration (s): 0.44 | learning rate: 8.710E-05 | global batch size: 256 | lm loss: 2.258696E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.003 | TFLOPs: 30.27 | 7: iteration 67530/ 115203 | consumed samples: 17287680 | consumed tokens: 35405168640 | elapsed time per iteration (s): 0.42 | learning rate: 8.707E-05 | global batch size: 256 | lm loss: 2.242997E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.648 | TFLOPs: 31.93 | 7: iteration 67540/ 115203 | consumed samples: 17290240 | consumed tokens: 35410411520 | elapsed time per iteration (s): 0.43 | learning rate: 8.705E-05 | global batch size: 256 | lm loss: 2.269188E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.084 | TFLOPs: 31.12 | 7: iteration 67550/ 115203 | consumed samples: 17292800 | consumed tokens: 35415654400 | elapsed time per iteration (s): 0.43 | learning rate: 8.702E-05 | global batch size: 256 | lm loss: 2.249802E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.426 | TFLOPs: 31.19 | 7: iteration 67560/ 115203 | consumed samples: 17295360 | consumed tokens: 35420897280 | elapsed time per iteration (s): 0.42 | learning rate: 8.700E-05 | global batch size: 256 | lm loss: 2.282273E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.587 | TFLOPs: 31.72 | 7: iteration 67570/ 115203 | consumed samples: 17297920 | consumed tokens: 35426140160 | elapsed time per iteration (s): 0.44 | learning rate: 8.698E-05 | global batch size: 256 | lm loss: 2.242646E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.942 | TFLOPs: 30.64 | 7: iteration 67580/ 115203 | consumed samples: 17300480 | consumed tokens: 35431383040 | elapsed time per iteration (s): 0.43 | learning rate: 8.695E-05 | global batch size: 256 | lm loss: 2.299795E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.208 | TFLOPs: 31.07 | 7: iteration 67590/ 115203 | consumed samples: 17303040 | consumed tokens: 35436625920 | elapsed time per iteration (s): 0.45 | learning rate: 8.693E-05 | global batch size: 256 | lm loss: 2.273094E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.119 | TFLOPs: 30.12 | 7: iteration 67600/ 115203 | consumed samples: 17305600 | consumed tokens: 35441868800 | elapsed time per iteration (s): 0.43 | learning rate: 8.690E-05 | global batch size: 256 | lm loss: 2.247948E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.149 | TFLOPs: 31.12 | 7: iteration 67610/ 115203 | consumed samples: 17308160 | consumed tokens: 35447111680 | elapsed time per iteration (s): 0.45 | learning rate: 8.688E-05 | global batch size: 256 | lm loss: 2.250850E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.321 | TFLOPs: 29.66 | 7: iteration 67620/ 115203 | consumed samples: 17310720 | consumed tokens: 35452354560 | elapsed time per iteration (s): 0.43 | learning rate: 8.686E-05 | global batch size: 256 | lm loss: 2.285248E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.163 | TFLOPs: 30.96 | 7: iteration 67630/ 115203 | consumed samples: 17313280 | consumed tokens: 35457597440 | elapsed time per iteration (s): 0.44 | learning rate: 8.683E-05 | global batch size: 256 | lm loss: 2.260131E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.801 | TFLOPs: 30.32 | 7: iteration 67640/ 115203 | consumed samples: 17315840 | consumed tokens: 35462840320 | elapsed time per iteration (s): 0.43 | learning rate: 8.681E-05 | global batch size: 256 | lm loss: 2.235774E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.306 | TFLOPs: 31.08 | 7: iteration 67650/ 115203 | consumed samples: 17318400 | consumed tokens: 35468083200 | elapsed time per iteration (s): 0.43 | learning rate: 8.678E-05 | global batch size: 256 | lm loss: 2.285601E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.640 | TFLOPs: 30.88 | 7: iteration 67660/ 115203 | consumed samples: 17320960 | consumed tokens: 35473326080 | elapsed time per iteration (s): 0.43 | learning rate: 8.676E-05 | global batch size: 256 | lm loss: 2.282928E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.822 | TFLOPs: 31.31 | 7: iteration 67670/ 115203 | consumed samples: 17323520 | consumed tokens: 35478568960 | elapsed time per iteration (s): 0.43 | learning rate: 8.674E-05 | global batch size: 256 | lm loss: 2.263074E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.487 | TFLOPs: 31.56 | 7: iteration 67680/ 115203 | consumed samples: 17326080 | consumed tokens: 35483811840 | elapsed time per iteration (s): 0.43 | learning rate: 8.671E-05 | global batch size: 256 | lm loss: 2.252991E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.017 | TFLOPs: 31.38 | 7: iteration 67690/ 115203 | consumed samples: 17328640 | consumed tokens: 35489054720 | elapsed time per iteration (s): 0.45 | learning rate: 8.669E-05 | global batch size: 256 | lm loss: 2.291496E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.525 | TFLOPs: 29.99 | 7: iteration 67700/ 115203 | consumed samples: 17331200 | consumed tokens: 35494297600 | elapsed time per iteration (s): 0.43 | learning rate: 8.666E-05 | global batch size: 256 | lm loss: 2.249867E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.654 | TFLOPs: 30.89 | 7: iteration 67710/ 115203 | consumed samples: 17333760 | consumed tokens: 35499540480 | elapsed time per iteration (s): 0.42 | learning rate: 8.664E-05 | global batch size: 256 | lm loss: 2.262099E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.843 | TFLOPs: 31.74 | 7: iteration 67720/ 115203 | consumed samples: 17336320 | consumed tokens: 35504783360 | elapsed time per iteration (s): 0.44 | learning rate: 8.662E-05 | global batch size: 256 | lm loss: 2.288673E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.943 | TFLOPs: 30.38 | 7: iteration 67730/ 115203 | consumed samples: 17338880 | consumed tokens: 35510026240 | elapsed time per iteration (s): 0.42 | learning rate: 8.659E-05 | global batch size: 256 | lm loss: 2.261160E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.977 | TFLOPs: 31.95 | 7: iteration 67740/ 115203 | consumed samples: 17341440 | consumed tokens: 35515269120 | elapsed time per iteration (s): 0.43 | learning rate: 8.657E-05 | global batch size: 256 | lm loss: 2.298659E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.972 | TFLOPs: 31.58 | 7: iteration 67750/ 115203 | consumed samples: 17344000 | consumed tokens: 35520512000 | elapsed time per iteration (s): 0.43 | learning rate: 8.654E-05 | global batch size: 256 | lm loss: 2.291409E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.532 | TFLOPs: 31.30 | 7: iteration 67760/ 115203 | consumed samples: 17346560 | consumed tokens: 35525754880 | elapsed time per iteration (s): 0.43 | learning rate: 8.652E-05 | global batch size: 256 | lm loss: 2.266372E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.307 | TFLOPs: 30.97 | 7: iteration 67770/ 115203 | consumed samples: 17349120 | consumed tokens: 35530997760 | elapsed time per iteration (s): 0.43 | learning rate: 8.650E-05 | global batch size: 256 | lm loss: 2.268147E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.676 | TFLOPs: 30.94 | 7: iteration 67780/ 115203 | consumed samples: 17351680 | consumed tokens: 35536240640 | elapsed time per iteration (s): 0.43 | learning rate: 8.647E-05 | global batch size: 256 | lm loss: 2.246038E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.597 | TFLOPs: 31.41 | 7: iteration 67790/ 115203 | consumed samples: 17354240 | consumed tokens: 35541483520 | elapsed time per iteration (s): 0.43 | learning rate: 8.645E-05 | global batch size: 256 | lm loss: 2.269884E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.864 | TFLOPs: 31.05 | 7: iteration 67800/ 115203 | consumed samples: 17356800 | consumed tokens: 35546726400 | elapsed time per iteration (s): 0.43 | learning rate: 8.642E-05 | global batch size: 256 | lm loss: 2.249794E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.370 | TFLOPs: 31.13 | 7: iteration 67810/ 115203 | consumed samples: 17359360 | consumed tokens: 35551969280 | elapsed time per iteration (s): 0.44 | learning rate: 8.640E-05 | global batch size: 256 | lm loss: 2.290145E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.877 | TFLOPs: 30.74 | 7: iteration 67820/ 115203 | consumed samples: 17361920 | consumed tokens: 35557212160 | elapsed time per iteration (s): 0.43 | learning rate: 8.638E-05 | global batch size: 256 | lm loss: 2.265283E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.528 | TFLOPs: 30.88 | 7: iteration 67830/ 115203 | consumed samples: 17364480 | consumed tokens: 35562455040 | elapsed time per iteration (s): 0.43 | learning rate: 8.635E-05 | global batch size: 256 | lm loss: 2.253964E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.793 | TFLOPs: 31.21 | 7: iteration 67840/ 115203 | consumed samples: 17367040 | consumed tokens: 35567697920 | elapsed time per iteration (s): 0.43 | learning rate: 8.633E-05 | global batch size: 256 | lm loss: 2.308146E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.977 | TFLOPs: 31.37 | 7: iteration 67850/ 115203 | consumed samples: 17369600 | consumed tokens: 35572940800 | elapsed time per iteration (s): 0.43 | learning rate: 8.630E-05 | global batch size: 256 | lm loss: 2.260459E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.721 | TFLOPs: 31.41 | 7: iteration 67860/ 115203 | consumed samples: 17372160 | consumed tokens: 35578183680 | elapsed time per iteration (s): 0.43 | learning rate: 8.628E-05 | global batch size: 256 | lm loss: 2.260985E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.700 | TFLOPs: 31.36 | 7: iteration 67870/ 115203 | consumed samples: 17374720 | consumed tokens: 35583426560 | elapsed time per iteration (s): 0.43 | learning rate: 8.626E-05 | global batch size: 256 | lm loss: 2.277511E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.066 | TFLOPs: 31.38 | 7: iteration 67880/ 115203 | consumed samples: 17377280 | consumed tokens: 35588669440 | elapsed time per iteration (s): 0.43 | learning rate: 8.623E-05 | global batch size: 256 | lm loss: 2.289781E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.857 | TFLOPs: 31.53 | 7: iteration 67890/ 115203 | consumed samples: 17379840 | consumed tokens: 35593912320 | elapsed time per iteration (s): 0.44 | learning rate: 8.621E-05 | global batch size: 256 | lm loss: 2.285863E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.589 | TFLOPs: 30.36 | 7: iteration 67900/ 115203 | consumed samples: 17382400 | consumed tokens: 35599155200 | elapsed time per iteration (s): 0.43 | learning rate: 8.619E-05 | global batch size: 256 | lm loss: 2.260783E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.409 | TFLOPs: 31.50 | 7: iteration 67910/ 115203 | consumed samples: 17384960 | consumed tokens: 35604398080 | elapsed time per iteration (s): 0.42 | learning rate: 8.616E-05 | global batch size: 256 | lm loss: 2.249420E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.185 | TFLOPs: 31.91 | 7: iteration 67920/ 115203 | consumed samples: 17387520 | consumed tokens: 35609640960 | elapsed time per iteration (s): 0.43 | learning rate: 8.614E-05 | global batch size: 256 | lm loss: 2.262512E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.845 | TFLOPs: 31.11 | 7: iteration 67930/ 115203 | consumed samples: 17390080 | consumed tokens: 35614883840 | elapsed time per iteration (s): 0.43 | learning rate: 8.611E-05 | global batch size: 256 | lm loss: 2.279258E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.511 | TFLOPs: 31.04 | 7: iteration 67940/ 115203 | consumed samples: 17392640 | consumed tokens: 35620126720 | elapsed time per iteration (s): 0.44 | learning rate: 8.609E-05 | global batch size: 256 | lm loss: 2.263898E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.171 | TFLOPs: 30.23 | 7: iteration 67950/ 115203 | consumed samples: 17395200 | consumed tokens: 35625369600 | elapsed time per iteration (s): 0.46 | learning rate: 8.607E-05 | global batch size: 256 | lm loss: 2.256898E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.791 | TFLOPs: 28.95 | 7: iteration 67960/ 115203 | consumed samples: 17397760 | consumed tokens: 35630612480 | elapsed time per iteration (s): 0.43 | learning rate: 8.604E-05 | global batch size: 256 | lm loss: 2.233767E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.720 | TFLOPs: 30.99 | 7: iteration 67970/ 115203 | consumed samples: 17400320 | consumed tokens: 35635855360 | elapsed time per iteration (s): 0.43 | learning rate: 8.602E-05 | global batch size: 256 | lm loss: 2.238039E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.361 | TFLOPs: 31.03 | 7: iteration 67980/ 115203 | consumed samples: 17402880 | consumed tokens: 35641098240 | elapsed time per iteration (s): 0.43 | learning rate: 8.599E-05 | global batch size: 256 | lm loss: 2.281180E+00 | grad norm: 0.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.801 | TFLOPs: 31.42 | 7: iteration 67990/ 115203 | consumed samples: 17405440 | consumed tokens: 35646341120 | elapsed time per iteration (s): 0.43 | learning rate: 8.597E-05 | global batch size: 256 | lm loss: 2.255703E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.659 | TFLOPs: 31.10 | 0: [2022-11-28 21:08:20,961] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=0, lr=[8.594634403532495e-05, 8.594634403532495e-05, 8.594634403532495e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 68000/ 115203 | consumed samples: 17408000 | consumed tokens: 35651584000 | elapsed time per iteration (s): 0.42 | learning rate: 8.595E-05 | global batch size: 256 | lm loss: 2.276996E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.032 | TFLOPs: 31.75 | 0: steps: 68000 loss: 2.2964 iter time (s): 0.431 samples/sec: 593.722 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 68000 | lm loss value: 2.085640E+00 | lm loss PPL: 8.049745E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 68000 to checkpoints_221m 0: [2022-11-28 21:08:21,121] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step68000 is begin to save! 0: [2022-11-28 21:08:21,127] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:08:21,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:08:21,275] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:08:21,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:08:21,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:08:21,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:08:21,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:08:21,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:08:21,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:08:21,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:08:21,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:08:21,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:08:21,390] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:08:21,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:08:21,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:08:21,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:08:21,439] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:08:21,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:08:21,460] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:08:21,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:08:21,483] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:08:21,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:08:21,507] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:08:21,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:08:21,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:08:21,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:08:21,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:08:21,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:08:21,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:08:21,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:08:21,601] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:08:21,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:08:21,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:08:21,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:08:21,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:08:21,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:08:21,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:08:21,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:08:21,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:08:21,699] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:08:21,700] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step68000/mp_rank_00_model_states.pt 0: [2022-11-28 21:08:21,700] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:08:21,702] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,421] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:08:22,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:08:22,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:08:22,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,481] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,481] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 6: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 2: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,479] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,479] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 21:08:22,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 3: [2022-11-28 21:08:22,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 5: [2022-11-28 21:08:22,487] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:08:22,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:08:22,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:08:22,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:08:22,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:08:22,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 7: [2022-11-28 21:08:22,498] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 4: [2022-11-28 21:08:22,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:08:22,509] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:08:22,509] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 1: [2022-11-28 21:08:22,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: [2022-11-28 21:08:22,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step68000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:08:22,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step68000 is ready now! 0: successfully saved checkpoint at iteration 68000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1454.81 7: iteration 68010/ 115203 | consumed samples: 17410560 | consumed tokens: 35656826880 | elapsed time per iteration (s): 0.60 | learning rate: 8.592E-05 | global batch size: 256 | lm loss: 2.278453E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 424.361 | TFLOPs: 22.27 | 7: iteration 68020/ 115203 | consumed samples: 17413120 | consumed tokens: 35662069760 | elapsed time per iteration (s): 0.43 | learning rate: 8.590E-05 | global batch size: 256 | lm loss: 2.252185E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.535 | TFLOPs: 30.88 | 7: iteration 68030/ 115203 | consumed samples: 17415680 | consumed tokens: 35667312640 | elapsed time per iteration (s): 0.44 | learning rate: 8.587E-05 | global batch size: 256 | lm loss: 2.253219E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.524 | TFLOPs: 30.46 | 7: iteration 68040/ 115203 | consumed samples: 17418240 | consumed tokens: 35672555520 | elapsed time per iteration (s): 0.43 | learning rate: 8.585E-05 | global batch size: 256 | lm loss: 2.264956E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.132 | TFLOPs: 31.33 | 7: iteration 68050/ 115203 | consumed samples: 17420800 | consumed tokens: 35677798400 | elapsed time per iteration (s): 0.43 | learning rate: 8.583E-05 | global batch size: 256 | lm loss: 2.258173E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.093 | TFLOPs: 31.22 | 7: iteration 68060/ 115203 | consumed samples: 17423360 | consumed tokens: 35683041280 | elapsed time per iteration (s): 0.43 | learning rate: 8.580E-05 | global batch size: 256 | lm loss: 2.278858E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.622 | TFLOPs: 31.46 | 7: iteration 68070/ 115203 | consumed samples: 17425920 | consumed tokens: 35688284160 | elapsed time per iteration (s): 0.44 | learning rate: 8.578E-05 | global batch size: 256 | lm loss: 2.254241E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.447 | TFLOPs: 30.82 | 7: iteration 68080/ 115203 | consumed samples: 17428480 | consumed tokens: 35693527040 | elapsed time per iteration (s): 0.44 | learning rate: 8.576E-05 | global batch size: 256 | lm loss: 2.265923E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.333 | TFLOPs: 30.82 | 7: iteration 68090/ 115203 | consumed samples: 17431040 | consumed tokens: 35698769920 | elapsed time per iteration (s): 0.44 | learning rate: 8.573E-05 | global batch size: 256 | lm loss: 2.293810E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.614 | TFLOPs: 30.57 | 7: iteration 68100/ 115203 | consumed samples: 17433600 | consumed tokens: 35704012800 | elapsed time per iteration (s): 0.43 | learning rate: 8.571E-05 | global batch size: 256 | lm loss: 2.296843E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.009 | TFLOPs: 31.22 | 7: iteration 68110/ 115203 | consumed samples: 17436160 | consumed tokens: 35709255680 | elapsed time per iteration (s): 0.42 | learning rate: 8.568E-05 | global batch size: 256 | lm loss: 2.281174E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.682 | TFLOPs: 31.88 | 7: iteration 68120/ 115203 | consumed samples: 17438720 | consumed tokens: 35714498560 | elapsed time per iteration (s): 0.43 | learning rate: 8.566E-05 | global batch size: 256 | lm loss: 2.248010E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.151 | TFLOPs: 31.23 | 7: iteration 68130/ 115203 | consumed samples: 17441280 | consumed tokens: 35719741440 | elapsed time per iteration (s): 0.42 | learning rate: 8.564E-05 | global batch size: 256 | lm loss: 2.283556E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.654 | TFLOPs: 31.73 | 7: iteration 68140/ 115203 | consumed samples: 17443840 | consumed tokens: 35724984320 | elapsed time per iteration (s): 0.43 | learning rate: 8.561E-05 | global batch size: 256 | lm loss: 2.265940E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.380 | TFLOPs: 31.13 | 7: iteration 68150/ 115203 | consumed samples: 17446400 | consumed tokens: 35730227200 | elapsed time per iteration (s): 0.43 | learning rate: 8.559E-05 | global batch size: 256 | lm loss: 2.269010E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.059 | TFLOPs: 31.33 | 7: iteration 68160/ 115203 | consumed samples: 17448960 | consumed tokens: 35735470080 | elapsed time per iteration (s): 0.44 | learning rate: 8.556E-05 | global batch size: 256 | lm loss: 2.273536E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.031 | TFLOPs: 30.43 | 7: iteration 68170/ 115203 | consumed samples: 17451520 | consumed tokens: 35740712960 | elapsed time per iteration (s): 0.44 | learning rate: 8.554E-05 | global batch size: 256 | lm loss: 2.291842E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.615 | TFLOPs: 30.25 | 7: iteration 68180/ 115203 | consumed samples: 17454080 | consumed tokens: 35745955840 | elapsed time per iteration (s): 0.43 | learning rate: 8.552E-05 | global batch size: 256 | lm loss: 2.271611E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.018 | TFLOPs: 31.11 | 7: iteration 68190/ 115203 | consumed samples: 17456640 | consumed tokens: 35751198720 | elapsed time per iteration (s): 0.44 | learning rate: 8.549E-05 | global batch size: 256 | lm loss: 2.286330E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.688 | TFLOPs: 30.68 | 7: iteration 68200/ 115203 | consumed samples: 17459200 | consumed tokens: 35756441600 | elapsed time per iteration (s): 0.43 | learning rate: 8.547E-05 | global batch size: 256 | lm loss: 2.284568E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.110 | TFLOPs: 31.33 | 7: iteration 68210/ 115203 | consumed samples: 17461760 | consumed tokens: 35761684480 | elapsed time per iteration (s): 0.42 | learning rate: 8.545E-05 | global batch size: 256 | lm loss: 2.249488E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.706 | TFLOPs: 31.99 | 7: iteration 68220/ 115203 | consumed samples: 17464320 | consumed tokens: 35766927360 | elapsed time per iteration (s): 0.44 | learning rate: 8.542E-05 | global batch size: 256 | lm loss: 2.269302E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.561 | TFLOPs: 30.78 | 7: iteration 68230/ 115203 | consumed samples: 17466880 | consumed tokens: 35772170240 | elapsed time per iteration (s): 0.43 | learning rate: 8.540E-05 | global batch size: 256 | lm loss: 2.302645E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.404 | TFLOPs: 31.29 | 7: iteration 68240/ 115203 | consumed samples: 17469440 | consumed tokens: 35777413120 | elapsed time per iteration (s): 0.44 | learning rate: 8.537E-05 | global batch size: 256 | lm loss: 2.285628E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.534 | TFLOPs: 30.72 | 7: iteration 68250/ 115203 | consumed samples: 17472000 | consumed tokens: 35782656000 | elapsed time per iteration (s): 0.43 | learning rate: 8.535E-05 | global batch size: 256 | lm loss: 2.282313E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.681 | TFLOPs: 31.36 | 7: iteration 68260/ 115203 | consumed samples: 17474560 | consumed tokens: 35787898880 | elapsed time per iteration (s): 0.43 | learning rate: 8.533E-05 | global batch size: 256 | lm loss: 2.263781E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.097 | TFLOPs: 31.07 | 7: iteration 68270/ 115203 | consumed samples: 17477120 | consumed tokens: 35793141760 | elapsed time per iteration (s): 0.43 | learning rate: 8.530E-05 | global batch size: 256 | lm loss: 2.256743E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.165 | TFLOPs: 31.49 | 7: iteration 68280/ 115203 | consumed samples: 17479680 | consumed tokens: 35798384640 | elapsed time per iteration (s): 0.43 | learning rate: 8.528E-05 | global batch size: 256 | lm loss: 2.256382E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.823 | TFLOPs: 31.00 | 7: iteration 68290/ 115203 | consumed samples: 17482240 | consumed tokens: 35803627520 | elapsed time per iteration (s): 0.43 | learning rate: 8.525E-05 | global batch size: 256 | lm loss: 2.289462E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.131 | TFLOPs: 31.54 | 7: iteration 68300/ 115203 | consumed samples: 17484800 | consumed tokens: 35808870400 | elapsed time per iteration (s): 0.43 | learning rate: 8.523E-05 | global batch size: 256 | lm loss: 2.257025E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.574 | TFLOPs: 31.51 | 7: iteration 68310/ 115203 | consumed samples: 17487360 | consumed tokens: 35814113280 | elapsed time per iteration (s): 0.42 | learning rate: 8.521E-05 | global batch size: 256 | lm loss: 2.253771E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.561 | TFLOPs: 31.67 | 7: iteration 68320/ 115203 | consumed samples: 17489920 | consumed tokens: 35819356160 | elapsed time per iteration (s): 0.43 | learning rate: 8.518E-05 | global batch size: 256 | lm loss: 2.294380E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.635 | TFLOPs: 31.25 | 7: iteration 68330/ 115203 | consumed samples: 17492480 | consumed tokens: 35824599040 | elapsed time per iteration (s): 0.43 | learning rate: 8.516E-05 | global batch size: 256 | lm loss: 2.256223E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.861 | TFLOPs: 31.21 | 7: iteration 68340/ 115203 | consumed samples: 17495040 | consumed tokens: 35829841920 | elapsed time per iteration (s): 0.42 | learning rate: 8.514E-05 | global batch size: 256 | lm loss: 2.259877E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.993 | TFLOPs: 31.69 | 7: iteration 68350/ 115203 | consumed samples: 17497600 | consumed tokens: 35835084800 | elapsed time per iteration (s): 0.44 | learning rate: 8.511E-05 | global batch size: 256 | lm loss: 2.272895E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.497 | TFLOPs: 30.62 | 7: iteration 68360/ 115203 | consumed samples: 17500160 | consumed tokens: 35840327680 | elapsed time per iteration (s): 0.42 | learning rate: 8.509E-05 | global batch size: 256 | lm loss: 2.248643E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.056 | TFLOPs: 31.69 | 7: iteration 68370/ 115203 | consumed samples: 17502720 | consumed tokens: 35845570560 | elapsed time per iteration (s): 0.43 | learning rate: 8.506E-05 | global batch size: 256 | lm loss: 2.244564E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.073 | TFLOPs: 31.01 | 7: iteration 68380/ 115203 | consumed samples: 17505280 | consumed tokens: 35850813440 | elapsed time per iteration (s): 0.45 | learning rate: 8.504E-05 | global batch size: 256 | lm loss: 2.299361E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.236 | TFLOPs: 29.81 | 7: iteration 68390/ 115203 | consumed samples: 17507840 | consumed tokens: 35856056320 | elapsed time per iteration (s): 0.43 | learning rate: 8.502E-05 | global batch size: 256 | lm loss: 2.260452E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.265 | TFLOPs: 31.08 | 7: iteration 68400/ 115203 | consumed samples: 17510400 | consumed tokens: 35861299200 | elapsed time per iteration (s): 0.43 | learning rate: 8.499E-05 | global batch size: 256 | lm loss: 2.342376E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.079 | TFLOPs: 30.96 | 7: iteration 68410/ 115203 | consumed samples: 17512960 | consumed tokens: 35866542080 | elapsed time per iteration (s): 0.43 | learning rate: 8.497E-05 | global batch size: 256 | lm loss: 2.264672E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.873 | TFLOPs: 31.21 | 7: iteration 68420/ 115203 | consumed samples: 17515520 | consumed tokens: 35871784960 | elapsed time per iteration (s): 0.43 | learning rate: 8.494E-05 | global batch size: 256 | lm loss: 2.252550E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.657 | TFLOPs: 31.15 | 7: iteration 68430/ 115203 | consumed samples: 17518080 | consumed tokens: 35877027840 | elapsed time per iteration (s): 0.42 | learning rate: 8.492E-05 | global batch size: 256 | lm loss: 2.307185E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.936 | TFLOPs: 31.69 | 7: iteration 68440/ 115203 | consumed samples: 17520640 | consumed tokens: 35882270720 | elapsed time per iteration (s): 0.42 | learning rate: 8.490E-05 | global batch size: 256 | lm loss: 2.278552E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.094 | TFLOPs: 31.70 | 7: iteration 68450/ 115203 | consumed samples: 17523200 | consumed tokens: 35887513600 | elapsed time per iteration (s): 0.42 | learning rate: 8.487E-05 | global batch size: 256 | lm loss: 2.251914E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.705 | TFLOPs: 31.78 | 7: iteration 68460/ 115203 | consumed samples: 17525760 | consumed tokens: 35892756480 | elapsed time per iteration (s): 0.43 | learning rate: 8.485E-05 | global batch size: 256 | lm loss: 2.215231E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.133 | TFLOPs: 31.38 | 7: iteration 68470/ 115203 | consumed samples: 17528320 | consumed tokens: 35897999360 | elapsed time per iteration (s): 0.43 | learning rate: 8.483E-05 | global batch size: 256 | lm loss: 2.259892E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.442 | TFLOPs: 31.40 | 7: iteration 68480/ 115203 | consumed samples: 17530880 | consumed tokens: 35903242240 | elapsed time per iteration (s): 0.42 | learning rate: 8.480E-05 | global batch size: 256 | lm loss: 2.279599E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.279 | TFLOPs: 31.81 | 7: iteration 68490/ 115203 | consumed samples: 17533440 | consumed tokens: 35908485120 | elapsed time per iteration (s): 0.43 | learning rate: 8.478E-05 | global batch size: 256 | lm loss: 2.280820E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.601 | TFLOPs: 31.51 | 7: iteration 68500/ 115203 | consumed samples: 17536000 | consumed tokens: 35913728000 | elapsed time per iteration (s): 0.42 | learning rate: 8.475E-05 | global batch size: 256 | lm loss: 2.270189E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.748 | TFLOPs: 31.68 | 7: iteration 68510/ 115203 | consumed samples: 17538560 | consumed tokens: 35918970880 | elapsed time per iteration (s): 0.42 | learning rate: 8.473E-05 | global batch size: 256 | lm loss: 2.253900E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.084 | TFLOPs: 32.06 | 7: iteration 68520/ 115203 | consumed samples: 17541120 | consumed tokens: 35924213760 | elapsed time per iteration (s): 0.43 | learning rate: 8.471E-05 | global batch size: 256 | lm loss: 2.278814E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.090 | TFLOPs: 31.22 | 7: iteration 68530/ 115203 | consumed samples: 17543680 | consumed tokens: 35929456640 | elapsed time per iteration (s): 0.43 | learning rate: 8.468E-05 | global batch size: 256 | lm loss: 2.286466E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.098 | TFLOPs: 31.22 | 7: iteration 68540/ 115203 | consumed samples: 17546240 | consumed tokens: 35934699520 | elapsed time per iteration (s): 0.42 | learning rate: 8.466E-05 | global batch size: 256 | lm loss: 2.251430E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.338 | TFLOPs: 31.87 | 7: iteration 68550/ 115203 | consumed samples: 17548800 | consumed tokens: 35939942400 | elapsed time per iteration (s): 0.42 | learning rate: 8.464E-05 | global batch size: 256 | lm loss: 2.283740E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.428 | TFLOPs: 31.87 | 7: iteration 68560/ 115203 | consumed samples: 17551360 | consumed tokens: 35945185280 | elapsed time per iteration (s): 0.42 | learning rate: 8.461E-05 | global batch size: 256 | lm loss: 2.255780E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.868 | TFLOPs: 31.74 | 7: iteration 68570/ 115203 | consumed samples: 17553920 | consumed tokens: 35950428160 | elapsed time per iteration (s): 0.42 | learning rate: 8.459E-05 | global batch size: 256 | lm loss: 2.270206E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.462 | TFLOPs: 31.61 | 7: iteration 68580/ 115203 | consumed samples: 17556480 | consumed tokens: 35955671040 | elapsed time per iteration (s): 0.42 | learning rate: 8.456E-05 | global batch size: 256 | lm loss: 2.292681E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.338 | TFLOPs: 31.81 | 7: iteration 68590/ 115203 | consumed samples: 17559040 | consumed tokens: 35960913920 | elapsed time per iteration (s): 0.42 | learning rate: 8.454E-05 | global batch size: 256 | lm loss: 2.288386E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.239 | TFLOPs: 31.76 | 7: iteration 68600/ 115203 | consumed samples: 17561600 | consumed tokens: 35966156800 | elapsed time per iteration (s): 0.43 | learning rate: 8.452E-05 | global batch size: 256 | lm loss: 2.261547E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.790 | TFLOPs: 31.21 | 7: iteration 68610/ 115203 | consumed samples: 17564160 | consumed tokens: 35971399680 | elapsed time per iteration (s): 0.43 | learning rate: 8.449E-05 | global batch size: 256 | lm loss: 2.315601E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.798 | TFLOPs: 31.05 | 7: iteration 68620/ 115203 | consumed samples: 17566720 | consumed tokens: 35976642560 | elapsed time per iteration (s): 0.43 | learning rate: 8.447E-05 | global batch size: 256 | lm loss: 2.278141E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.257 | TFLOPs: 30.97 | 7: iteration 68630/ 115203 | consumed samples: 17569280 | consumed tokens: 35981885440 | elapsed time per iteration (s): 0.43 | learning rate: 8.445E-05 | global batch size: 256 | lm loss: 2.285553E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | 7: iteration 68640/ 115203 | consumed samples: 17571840 | consumed tokens: 35987128320 | elapsed time per iteration (s): 0.43 | learning rate: 8.442E-05 | global batch size: 256 | lm loss: 2.306012E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.158 | TFLOPs: 31.54 | 7: iteration 68650/ 115203 | consumed samples: 17574400 | consumed tokens: 35992371200 | elapsed time per iteration (s): 0.43 | learning rate: 8.440E-05 | global batch size: 256 | lm loss: 2.253339E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.192 | TFLOPs: 31.39 | 7: iteration 68660/ 115203 | consumed samples: 17576960 | consumed tokens: 35997614080 | elapsed time per iteration (s): 0.44 | learning rate: 8.437E-05 | global batch size: 256 | lm loss: 2.272751E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.665 | TFLOPs: 30.62 | 7: iteration 68670/ 115203 | consumed samples: 17579520 | consumed tokens: 36002856960 | elapsed time per iteration (s): 0.44 | learning rate: 8.435E-05 | global batch size: 256 | lm loss: 2.287034E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.499 | TFLOPs: 30.83 | 7: iteration 68680/ 115203 | consumed samples: 17582080 | consumed tokens: 36008099840 | elapsed time per iteration (s): 0.43 | learning rate: 8.433E-05 | global batch size: 256 | lm loss: 2.268967E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.798 | TFLOPs: 31.05 | 7: iteration 68690/ 115203 | consumed samples: 17584640 | consumed tokens: 36013342720 | elapsed time per iteration (s): 0.43 | learning rate: 8.430E-05 | global batch size: 256 | lm loss: 2.280214E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.552 | TFLOPs: 31.51 | 7: iteration 68700/ 115203 | consumed samples: 17587200 | consumed tokens: 36018585600 | elapsed time per iteration (s): 0.46 | learning rate: 8.428E-05 | global batch size: 256 | lm loss: 2.253625E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.223 | TFLOPs: 29.39 | 7: iteration 68710/ 115203 | consumed samples: 17589760 | consumed tokens: 36023828480 | elapsed time per iteration (s): 0.43 | learning rate: 8.425E-05 | global batch size: 256 | lm loss: 2.271495E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.326 | TFLOPs: 31.55 | 7: iteration 68720/ 115203 | consumed samples: 17592320 | consumed tokens: 36029071360 | elapsed time per iteration (s): 0.44 | learning rate: 8.423E-05 | global batch size: 256 | lm loss: 2.254315E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.094 | TFLOPs: 30.59 | 7: iteration 68730/ 115203 | consumed samples: 17594880 | consumed tokens: 36034314240 | elapsed time per iteration (s): 0.42 | learning rate: 8.421E-05 | global batch size: 256 | lm loss: 2.253508E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.711 | TFLOPs: 31.78 | 7: iteration 68740/ 115203 | consumed samples: 17597440 | consumed tokens: 36039557120 | elapsed time per iteration (s): 0.43 | learning rate: 8.418E-05 | global batch size: 256 | lm loss: 2.293168E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.889 | TFLOPs: 31.37 | 7: iteration 68750/ 115203 | consumed samples: 17600000 | consumed tokens: 36044800000 | elapsed time per iteration (s): 0.43 | learning rate: 8.416E-05 | global batch size: 256 | lm loss: 2.293168E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.889 | TFLOPs: 31.42 | 7: iteration 68760/ 115203 | consumed samples: 17602560 | consumed tokens: 36050042880 | elapsed time per iteration (s): 0.43 | learning rate: 8.414E-05 | global batch size: 256 | lm loss: 2.247863E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.953 | TFLOPs: 31.37 | 7: iteration 68770/ 115203 | consumed samples: 17605120 | consumed tokens: 36055285760 | elapsed time per iteration (s): 0.42 | learning rate: 8.411E-05 | global batch size: 256 | lm loss: 2.228477E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.639 | TFLOPs: 31.78 | 7: iteration 68780/ 115203 | consumed samples: 17607680 | consumed tokens: 36060528640 | elapsed time per iteration (s): 0.42 | learning rate: 8.409E-05 | global batch size: 256 | lm loss: 2.283657E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.541 | TFLOPs: 31.67 | 7: iteration 68790/ 115203 | consumed samples: 17610240 | consumed tokens: 36065771520 | elapsed time per iteration (s): 0.43 | learning rate: 8.406E-05 | global batch size: 256 | lm loss: 2.277237E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.532 | TFLOPs: 31.51 | 7: iteration 68800/ 115203 | consumed samples: 17612800 | consumed tokens: 36071014400 | elapsed time per iteration (s): 0.43 | learning rate: 8.404E-05 | global batch size: 256 | lm loss: 2.246583E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.515 | TFLOPs: 31.14 | 7: iteration 68810/ 115203 | consumed samples: 17615360 | consumed tokens: 36076257280 | elapsed time per iteration (s): 0.42 | learning rate: 8.402E-05 | global batch size: 256 | lm loss: 2.276307E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.372 | TFLOPs: 31.71 | 7: iteration 68820/ 115203 | consumed samples: 17617920 | consumed tokens: 36081500160 | elapsed time per iteration (s): 0.43 | learning rate: 8.399E-05 | global batch size: 256 | lm loss: 2.250731E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.885 | TFLOPs: 31.48 | 7: iteration 68830/ 115203 | consumed samples: 17620480 | consumed tokens: 36086743040 | elapsed time per iteration (s): 0.44 | learning rate: 8.397E-05 | global batch size: 256 | lm loss: 2.280751E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.089 | TFLOPs: 30.44 | 7: iteration 68840/ 115203 | consumed samples: 17623040 | consumed tokens: 36091985920 | elapsed time per iteration (s): 0.42 | learning rate: 8.395E-05 | global batch size: 256 | lm loss: 2.239411E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.068 | TFLOPs: 31.75 | 7: iteration 68850/ 115203 | consumed samples: 17625600 | consumed tokens: 36097228800 | elapsed time per iteration (s): 0.43 | learning rate: 8.392E-05 | global batch size: 256 | lm loss: 2.290894E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.176 | TFLOPs: 31.18 | 7: iteration 68860/ 115203 | consumed samples: 17628160 | consumed tokens: 36102471680 | elapsed time per iteration (s): 0.43 | learning rate: 8.390E-05 | global batch size: 256 | lm loss: 2.277306E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.533 | TFLOPs: 30.93 | 7: iteration 68870/ 115203 | consumed samples: 17630720 | consumed tokens: 36107714560 | elapsed time per iteration (s): 0.43 | learning rate: 8.388E-05 | global batch size: 256 | lm loss: 2.256908E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.120 | TFLOPs: 31.28 | 7: iteration 68880/ 115203 | consumed samples: 17633280 | consumed tokens: 36112957440 | elapsed time per iteration (s): 0.44 | learning rate: 8.385E-05 | global batch size: 256 | lm loss: 2.292867E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.914 | TFLOPs: 30.32 | 7: iteration 68890/ 115203 | consumed samples: 17635840 | consumed tokens: 36118200320 | elapsed time per iteration (s): 0.43 | learning rate: 8.383E-05 | global batch size: 256 | lm loss: 2.248603E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.971 | TFLOPs: 31.16 | 7: iteration 68900/ 115203 | consumed samples: 17638400 | consumed tokens: 36123443200 | elapsed time per iteration (s): 0.42 | learning rate: 8.380E-05 | global batch size: 256 | lm loss: 2.263711E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.372 | TFLOPs: 31.66 | 7: iteration 68910/ 115203 | consumed samples: 17640960 | consumed tokens: 36128686080 | elapsed time per iteration (s): 0.43 | learning rate: 8.378E-05 | global batch size: 256 | lm loss: 2.263160E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.959 | TFLOPs: 31.53 | 7: iteration 68920/ 115203 | consumed samples: 17643520 | consumed tokens: 36133928960 | elapsed time per iteration (s): 0.43 | learning rate: 8.376E-05 | global batch size: 256 | lm loss: 2.258353E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.943 | TFLOPs: 31.48 | 7: iteration 68930/ 115203 | consumed samples: 17646080 | consumed tokens: 36139171840 | elapsed time per iteration (s): 0.43 | learning rate: 8.373E-05 | global batch size: 256 | lm loss: 2.290189E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.352 | TFLOPs: 31.55 | 7: iteration 68940/ 115203 | consumed samples: 17648640 | consumed tokens: 36144414720 | elapsed time per iteration (s): 0.43 | learning rate: 8.371E-05 | global batch size: 256 | lm loss: 2.252700E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.652 | TFLOPs: 31.20 | 7: iteration 68950/ 115203 | consumed samples: 17651200 | consumed tokens: 36149657600 | elapsed time per iteration (s): 0.43 | learning rate: 8.369E-05 | global batch size: 256 | lm loss: 2.262992E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.284 | TFLOPs: 31.18 | 7: iteration 68960/ 115203 | consumed samples: 17653760 | consumed tokens: 36154900480 | elapsed time per iteration (s): 0.42 | learning rate: 8.366E-05 | global batch size: 256 | lm loss: 2.278867E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.525 | TFLOPs: 31.88 | 7: iteration 68970/ 115203 | consumed samples: 17656320 | consumed tokens: 36160143360 | elapsed time per iteration (s): 0.42 | learning rate: 8.364E-05 | global batch size: 256 | lm loss: 2.254785E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.655 | TFLOPs: 31.67 | 7: iteration 68980/ 115203 | consumed samples: 17658880 | consumed tokens: 36165386240 | elapsed time per iteration (s): 0.42 | learning rate: 8.361E-05 | global batch size: 256 | lm loss: 2.268219E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.242 | TFLOPs: 31.76 | 7: iteration 68990/ 115203 | consumed samples: 17661440 | consumed tokens: 36170629120 | elapsed time per iteration (s): 0.42 | learning rate: 8.359E-05 | global batch size: 256 | lm loss: 2.260874E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.516 | TFLOPs: 31.82 | 7: iteration 69000/ 115203 | consumed samples: 17664000 | consumed tokens: 36175872000 | elapsed time per iteration (s): 0.42 | learning rate: 8.357E-05 | global batch size: 256 | lm loss: 2.237003E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.152 | TFLOPs: 31.70 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 69000 | lm loss value: 2.190241E+00 | lm loss PPL: 8.937366E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 69000 to checkpoints_221m 0: [2022-11-28 21:15:32,393] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step69000 is begin to save! 0: [2022-11-28 21:15:32,417] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:15:32,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:15:32,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:15:32,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:15:32,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:15:32,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:15:32,563] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:15:32,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:15:32,587] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:15:32,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:15:32,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:15:32,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:15:32,633] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:15:32,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:15:32,657] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:15:32,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:15:32,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:15:32,703] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:15:32,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:15:32,725] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:15:32,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:15:32,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:15:32,750] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:15:32,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:15:32,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:15:32,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:15:32,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:15:32,820] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:15:32,821] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:15:32,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:15:32,844] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:15:32,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:15:32,866] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:15:32,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:15:32,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:15:32,914] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:15:32,915] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:15:32,938] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:15:32,938] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:15:32,942] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:15:32,943] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step69000/mp_rank_00_model_states.pt 0: [2022-11-28 21:15:32,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:15:32,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:15:32,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:15:33,011] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,011] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,011] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 1: [2022-11-28 21:15:33,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:15:33,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:15:33,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 5: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:15:33,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 21:15:33,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:15:33,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 2: [2022-11-28 21:15:33,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:15:33,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:15:33,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 7: [2022-11-28 21:15:33,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:15:33,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:15:33,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:15:33,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: [2022-11-28 21:15:33,075] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:15:33,075] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,079] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,079] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:15:33,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:15:33,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 3: [2022-11-28 21:15:33,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step69000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 4: [2022-11-28 21:15:33,282] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step69000 is ready now! 0: successfully saved checkpoint at iteration 69000 to checkpoints_221m 7: time (ms) | save-checkpoint: 914.88 7: iteration 69010/ 115203 | consumed samples: 17666560 | consumed tokens: 36181114880 | elapsed time per iteration (s): 0.54 | learning rate: 8.354E-05 | global batch size: 256 | lm loss: 2.280401E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 477.646 | TFLOPs: 25.06 | 7: iteration 69020/ 115203 | consumed samples: 17669120 | consumed tokens: 36186357760 | elapsed time per iteration (s): 0.46 | learning rate: 8.352E-05 | global batch size: 256 | lm loss: 2.269489E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.556 | TFLOPs: 28.99 | 7: iteration 69030/ 115203 | consumed samples: 17671680 | consumed tokens: 36191600640 | elapsed time per iteration (s): 0.43 | learning rate: 8.350E-05 | global batch size: 256 | lm loss: 2.270636E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.798 | TFLOPs: 31.47 | 7: iteration 69040/ 115203 | consumed samples: 17674240 | consumed tokens: 36196843520 | elapsed time per iteration (s): 0.57 | learning rate: 8.347E-05 | global batch size: 256 | lm loss: 2.276872E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 449.369 | TFLOPs: 23.58 | 7: iteration 69050/ 115203 | consumed samples: 17676800 | consumed tokens: 36202086400 | elapsed time per iteration (s): 0.43 | learning rate: 8.345E-05 | global batch size: 256 | lm loss: 2.240645E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.232 | TFLOPs: 31.39 | 7: iteration 69060/ 115203 | consumed samples: 17679360 | consumed tokens: 36207329280 | elapsed time per iteration (s): 0.43 | learning rate: 8.342E-05 | global batch size: 256 | lm loss: 2.255173E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.784 | TFLOPs: 31.26 | 7: iteration 69070/ 115203 | consumed samples: 17681920 | consumed tokens: 36212572160 | elapsed time per iteration (s): 0.43 | learning rate: 8.340E-05 | global batch size: 256 | lm loss: 2.285814E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.273 | TFLOPs: 31.18 | 7: iteration 69080/ 115203 | consumed samples: 17684480 | consumed tokens: 36217815040 | elapsed time per iteration (s): 0.43 | learning rate: 8.338E-05 | global batch size: 256 | lm loss: 2.242066E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.493 | TFLOPs: 31.19 | 7: iteration 69090/ 115203 | consumed samples: 17687040 | consumed tokens: 36223057920 | elapsed time per iteration (s): 0.42 | learning rate: 8.335E-05 | global batch size: 256 | lm loss: 2.254295E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.824 | TFLOPs: 31.89 | 7: iteration 69100/ 115203 | consumed samples: 17689600 | consumed tokens: 36228300800 | elapsed time per iteration (s): 0.44 | learning rate: 8.333E-05 | global batch size: 256 | lm loss: 2.250155E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.590 | TFLOPs: 30.25 | 7: iteration 69110/ 115203 | consumed samples: 17692160 | consumed tokens: 36233543680 | elapsed time per iteration (s): 0.44 | learning rate: 8.331E-05 | global batch size: 256 | lm loss: 2.257960E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.133 | TFLOPs: 30.75 | 7: iteration 69120/ 115203 | consumed samples: 17694720 | consumed tokens: 36238786560 | elapsed time per iteration (s): 0.42 | learning rate: 8.328E-05 | global batch size: 256 | lm loss: 2.301430E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.346 | TFLOPs: 32.08 | 7: iteration 69130/ 115203 | consumed samples: 17697280 | consumed tokens: 36244029440 | elapsed time per iteration (s): 0.43 | learning rate: 8.326E-05 | global batch size: 256 | lm loss: 2.251054E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.805 | TFLOPs: 31.21 | 7: iteration 69140/ 115203 | consumed samples: 17699840 | consumed tokens: 36249272320 | elapsed time per iteration (s): 0.42 | learning rate: 8.324E-05 | global batch size: 256 | lm loss: 2.255355E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.942 | TFLOPs: 32.00 | 7: iteration 69150/ 115203 | consumed samples: 17702400 | consumed tokens: 36254515200 | elapsed time per iteration (s): 0.43 | learning rate: 8.321E-05 | global batch size: 256 | lm loss: 2.283103E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.251 | TFLOPs: 31.49 | 7: iteration 69160/ 115203 | consumed samples: 17704960 | consumed tokens: 36259758080 | elapsed time per iteration (s): 0.43 | learning rate: 8.319E-05 | global batch size: 256 | lm loss: 2.251169E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.988 | TFLOPs: 31.59 | 7: iteration 69170/ 115203 | consumed samples: 17707520 | consumed tokens: 36265000960 | elapsed time per iteration (s): 0.43 | learning rate: 8.316E-05 | global batch size: 256 | lm loss: 2.276366E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.285 | TFLOPs: 31.34 | 7: iteration 69180/ 115203 | consumed samples: 17710080 | consumed tokens: 36270243840 | elapsed time per iteration (s): 0.43 | learning rate: 8.314E-05 | global batch size: 256 | lm loss: 2.270738E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.730 | TFLOPs: 31.41 | 7: iteration 69190/ 115203 | consumed samples: 17712640 | consumed tokens: 36275486720 | elapsed time per iteration (s): 0.43 | learning rate: 8.312E-05 | global batch size: 256 | lm loss: 2.271679E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.986 | TFLOPs: 31.22 | 7: iteration 69200/ 115203 | consumed samples: 17715200 | consumed tokens: 36280729600 | elapsed time per iteration (s): 0.42 | learning rate: 8.309E-05 | global batch size: 256 | lm loss: 2.257352E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.336 | TFLOPs: 31.76 | 7: iteration 69210/ 115203 | consumed samples: 17717760 | consumed tokens: 36285972480 | elapsed time per iteration (s): 0.43 | learning rate: 8.307E-05 | global batch size: 256 | lm loss: 2.289798E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.643 | TFLOPs: 31.04 | 7: iteration 69220/ 115203 | consumed samples: 17720320 | consumed tokens: 36291215360 | elapsed time per iteration (s): 0.43 | learning rate: 8.305E-05 | global batch size: 256 | lm loss: 2.251367E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.195 | TFLOPs: 31.33 | 7: iteration 69230/ 115203 | consumed samples: 17722880 | consumed tokens: 36296458240 | elapsed time per iteration (s): 0.43 | learning rate: 8.302E-05 | global batch size: 256 | lm loss: 2.272066E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.224 | TFLOPs: 31.49 | 7: iteration 69240/ 115203 | consumed samples: 17725440 | consumed tokens: 36301701120 | elapsed time per iteration (s): 0.43 | learning rate: 8.300E-05 | global batch size: 256 | lm loss: 2.225661E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.179 | TFLOPs: 31.18 | 7: iteration 69250/ 115203 | consumed samples: 17728000 | consumed tokens: 36306944000 | elapsed time per iteration (s): 0.43 | learning rate: 8.298E-05 | global batch size: 256 | lm loss: 2.257551E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.047 | TFLOPs: 30.96 | 7: iteration 69260/ 115203 | consumed samples: 17730560 | consumed tokens: 36312186880 | elapsed time per iteration (s): 0.44 | learning rate: 8.295E-05 | global batch size: 256 | lm loss: 2.281721E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.722 | TFLOPs: 30.78 | 7: iteration 69270/ 115203 | consumed samples: 17733120 | consumed tokens: 36317429760 | elapsed time per iteration (s): 0.43 | learning rate: 8.293E-05 | global batch size: 256 | lm loss: 2.250513E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.084 | TFLOPs: 31.01 | 7: iteration 69280/ 115203 | consumed samples: 17735680 | consumed tokens: 36322672640 | elapsed time per iteration (s): 0.42 | learning rate: 8.290E-05 | global batch size: 256 | lm loss: 2.221332E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.481 | TFLOPs: 31.98 | 7: iteration 69290/ 115203 | consumed samples: 17738240 | consumed tokens: 36327915520 | elapsed time per iteration (s): 0.44 | learning rate: 8.288E-05 | global batch size: 256 | lm loss: 2.287716E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.555 | TFLOPs: 30.62 | 7: iteration 69300/ 115203 | consumed samples: 17740800 | consumed tokens: 36333158400 | elapsed time per iteration (s): 0.43 | learning rate: 8.286E-05 | global batch size: 256 | lm loss: 2.266560E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.011 | TFLOPs: 31.01 | 7: iteration 69310/ 115203 | consumed samples: 17743360 | consumed tokens: 36338401280 | elapsed time per iteration (s): 0.43 | learning rate: 8.283E-05 | global batch size: 256 | lm loss: 2.233381E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.044 | TFLOPs: 31.59 | 7: iteration 69320/ 115203 | consumed samples: 17745920 | consumed tokens: 36343644160 | elapsed time per iteration (s): 0.44 | learning rate: 8.281E-05 | global batch size: 256 | lm loss: 2.261459E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.113 | TFLOPs: 30.49 | 7: iteration 69330/ 115203 | consumed samples: 17748480 | consumed tokens: 36348887040 | elapsed time per iteration (s): 0.43 | learning rate: 8.279E-05 | global batch size: 256 | lm loss: 2.276163E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.453 | TFLOPs: 31.50 | 7: iteration 69340/ 115203 | consumed samples: 17751040 | consumed tokens: 36354129920 | elapsed time per iteration (s): 0.44 | learning rate: 8.276E-05 | global batch size: 256 | lm loss: 2.273299E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.821 | TFLOPs: 30.68 | 7: iteration 69350/ 115203 | consumed samples: 17753600 | consumed tokens: 36359372800 | elapsed time per iteration (s): 0.45 | learning rate: 8.274E-05 | global batch size: 256 | lm loss: 2.267698E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.845 | TFLOPs: 29.79 | 7: iteration 69360/ 115203 | consumed samples: 17756160 | consumed tokens: 36364615680 | elapsed time per iteration (s): 0.43 | learning rate: 8.272E-05 | global batch size: 256 | lm loss: 2.269987E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.715 | TFLOPs: 31.57 | 7: iteration 69370/ 115203 | consumed samples: 17758720 | consumed tokens: 36369858560 | elapsed time per iteration (s): 0.43 | learning rate: 8.269E-05 | global batch size: 256 | lm loss: 2.256081E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | 7: iteration 69380/ 115203 | consumed samples: 17761280 | consumed tokens: 36375101440 | elapsed time per iteration (s): 0.43 | learning rate: 8.267E-05 | global batch size: 256 | lm loss: 2.230313E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.970 | TFLOPs: 31.43 | 7: iteration 69390/ 115203 | consumed samples: 17763840 | consumed tokens: 36380344320 | elapsed time per iteration (s): 0.43 | learning rate: 8.264E-05 | global batch size: 256 | lm loss: 2.262557E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.922 | TFLOPs: 31.06 | 7: iteration 69400/ 115203 | consumed samples: 17766400 | consumed tokens: 36385587200 | elapsed time per iteration (s): 0.44 | learning rate: 8.262E-05 | global batch size: 256 | lm loss: 2.280717E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.242 | TFLOPs: 30.71 | 7: iteration 69410/ 115203 | consumed samples: 17768960 | consumed tokens: 36390830080 | elapsed time per iteration (s): 0.44 | learning rate: 8.260E-05 | global batch size: 256 | lm loss: 2.252802E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.721 | TFLOPs: 30.47 | 7: iteration 69420/ 115203 | consumed samples: 17771520 | consumed tokens: 36396072960 | elapsed time per iteration (s): 0.43 | learning rate: 8.257E-05 | global batch size: 256 | lm loss: 2.250057E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.833 | TFLOPs: 31.37 | 7: iteration 69430/ 115203 | consumed samples: 17774080 | consumed tokens: 36401315840 | elapsed time per iteration (s): 0.44 | learning rate: 8.255E-05 | global batch size: 256 | lm loss: 2.277935E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.308 | TFLOPs: 30.66 | 7: iteration 69440/ 115203 | consumed samples: 17776640 | consumed tokens: 36406558720 | elapsed time per iteration (s): 0.44 | learning rate: 8.253E-05 | global batch size: 256 | lm loss: 2.255732E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.955 | TFLOPs: 30.69 | 7: iteration 69450/ 115203 | consumed samples: 17779200 | consumed tokens: 36411801600 | elapsed time per iteration (s): 0.43 | learning rate: 8.250E-05 | global batch size: 256 | lm loss: 2.251193E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.142 | TFLOPs: 31.07 | 7: iteration 69460/ 115203 | consumed samples: 17781760 | consumed tokens: 36417044480 | elapsed time per iteration (s): 0.43 | learning rate: 8.248E-05 | global batch size: 256 | lm loss: 2.294843E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.125 | TFLOPs: 31.02 | 7: iteration 69470/ 115203 | consumed samples: 17784320 | consumed tokens: 36422287360 | elapsed time per iteration (s): 0.45 | learning rate: 8.246E-05 | global batch size: 256 | lm loss: 2.248588E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.575 | TFLOPs: 29.94 | 7: iteration 69480/ 115203 | consumed samples: 17786880 | consumed tokens: 36427530240 | elapsed time per iteration (s): 0.43 | learning rate: 8.243E-05 | global batch size: 256 | lm loss: 2.278216E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.937 | TFLOPs: 31.43 | 7: iteration 69490/ 115203 | consumed samples: 17789440 | consumed tokens: 36432773120 | elapsed time per iteration (s): 0.43 | learning rate: 8.241E-05 | global batch size: 256 | lm loss: 2.261306E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.115 | TFLOPs: 31.01 | 7: iteration 69500/ 115203 | consumed samples: 17792000 | consumed tokens: 36438016000 | elapsed time per iteration (s): 0.43 | learning rate: 8.238E-05 | global batch size: 256 | lm loss: 2.273214E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.203 | TFLOPs: 31.60 | 7: iteration 69510/ 115203 | consumed samples: 17794560 | consumed tokens: 36443258880 | elapsed time per iteration (s): 0.43 | learning rate: 8.236E-05 | global batch size: 256 | lm loss: 2.255768E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.175 | TFLOPs: 31.49 | 7: iteration 69520/ 115203 | consumed samples: 17797120 | consumed tokens: 36448501760 | elapsed time per iteration (s): 0.44 | learning rate: 8.234E-05 | global batch size: 256 | lm loss: 2.297063E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.318 | TFLOPs: 30.29 | 7: iteration 69530/ 115203 | consumed samples: 17799680 | consumed tokens: 36453744640 | elapsed time per iteration (s): 0.42 | learning rate: 8.231E-05 | global batch size: 256 | lm loss: 2.260038E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.961 | TFLOPs: 32.00 | 7: iteration 69540/ 115203 | consumed samples: 17802240 | consumed tokens: 36458987520 | elapsed time per iteration (s): 0.43 | learning rate: 8.229E-05 | global batch size: 256 | lm loss: 2.243533E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.008 | TFLOPs: 31.43 | 7: iteration 69550/ 115203 | consumed samples: 17804800 | consumed tokens: 36464230400 | elapsed time per iteration (s): 0.43 | learning rate: 8.227E-05 | global batch size: 256 | lm loss: 2.258457E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.308 | TFLOPs: 31.08 | 7: iteration 69560/ 115203 | consumed samples: 17807360 | consumed tokens: 36469473280 | elapsed time per iteration (s): 0.43 | learning rate: 8.224E-05 | global batch size: 256 | lm loss: 2.254548E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.363 | TFLOPs: 31.03 | 7: iteration 69570/ 115203 | consumed samples: 17809920 | consumed tokens: 36474716160 | elapsed time per iteration (s): 0.42 | learning rate: 8.222E-05 | global batch size: 256 | lm loss: 2.280864E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.172 | TFLOPs: 31.86 | 7: iteration 69580/ 115203 | consumed samples: 17812480 | consumed tokens: 36479959040 | elapsed time per iteration (s): 0.43 | learning rate: 8.220E-05 | global batch size: 256 | lm loss: 2.268208E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.812 | TFLOPs: 31.58 | 7: iteration 69590/ 115203 | consumed samples: 17815040 | consumed tokens: 36485201920 | elapsed time per iteration (s): 0.42 | learning rate: 8.217E-05 | global batch size: 256 | lm loss: 2.275830E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.569 | TFLOPs: 31.62 | 7: iteration 69600/ 115203 | consumed samples: 17817600 | consumed tokens: 36490444800 | elapsed time per iteration (s): 0.43 | learning rate: 8.215E-05 | global batch size: 256 | lm loss: 2.308534E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.942 | TFLOPs: 31.48 | 7: iteration 69610/ 115203 | consumed samples: 17820160 | consumed tokens: 36495687680 | elapsed time per iteration (s): 0.43 | learning rate: 8.213E-05 | global batch size: 256 | lm loss: 2.256526E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.163 | TFLOPs: 31.44 | 7: iteration 69620/ 115203 | consumed samples: 17822720 | consumed tokens: 36500930560 | elapsed time per iteration (s): 0.43 | learning rate: 8.210E-05 | global batch size: 256 | lm loss: 2.260657E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.811 | TFLOPs: 31.37 | 7: iteration 69630/ 115203 | consumed samples: 17825280 | consumed tokens: 36506173440 | elapsed time per iteration (s): 0.43 | learning rate: 8.208E-05 | global batch size: 256 | lm loss: 2.233261E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.694 | TFLOPs: 31.05 | 7: iteration 69640/ 115203 | consumed samples: 17827840 | consumed tokens: 36511416320 | elapsed time per iteration (s): 0.43 | learning rate: 8.205E-05 | global batch size: 256 | lm loss: 2.287797E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.256 | TFLOPs: 31.55 | 7: iteration 69650/ 115203 | consumed samples: 17830400 | consumed tokens: 36516659200 | elapsed time per iteration (s): 0.43 | learning rate: 8.203E-05 | global batch size: 256 | lm loss: 2.243642E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.107 | TFLOPs: 31.33 | 7: iteration 69660/ 115203 | consumed samples: 17832960 | consumed tokens: 36521902080 | elapsed time per iteration (s): 0.43 | learning rate: 8.201E-05 | global batch size: 256 | lm loss: 2.266052E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.158 | TFLOPs: 30.91 | 7: iteration 69670/ 115203 | consumed samples: 17835520 | consumed tokens: 36527144960 | elapsed time per iteration (s): 0.43 | learning rate: 8.198E-05 | global batch size: 256 | lm loss: 2.258379E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.521 | TFLOPs: 31.40 | 7: iteration 69680/ 115203 | consumed samples: 17838080 | consumed tokens: 36532387840 | elapsed time per iteration (s): 0.43 | learning rate: 8.196E-05 | global batch size: 256 | lm loss: 2.263604E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.232 | TFLOPs: 31.07 | 7: iteration 69690/ 115203 | consumed samples: 17840640 | consumed tokens: 36537630720 | elapsed time per iteration (s): 0.43 | learning rate: 8.194E-05 | global batch size: 256 | lm loss: 2.225049E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.200 | TFLOPs: 31.44 | 7: iteration 69700/ 115203 | consumed samples: 17843200 | consumed tokens: 36542873600 | elapsed time per iteration (s): 0.43 | learning rate: 8.191E-05 | global batch size: 256 | lm loss: 2.251781E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.579 | TFLOPs: 31.41 | 7: iteration 69710/ 115203 | consumed samples: 17845760 | consumed tokens: 36548116480 | elapsed time per iteration (s): 0.43 | learning rate: 8.189E-05 | global batch size: 256 | lm loss: 2.310726E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.519 | TFLOPs: 31.25 | 7: iteration 69720/ 115203 | consumed samples: 17848320 | consumed tokens: 36553359360 | elapsed time per iteration (s): 0.43 | learning rate: 8.187E-05 | global batch size: 256 | lm loss: 2.268405E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.592 | TFLOPs: 31.41 | 7: iteration 69730/ 115203 | consumed samples: 17850880 | consumed tokens: 36558602240 | elapsed time per iteration (s): 0.43 | learning rate: 8.184E-05 | global batch size: 256 | lm loss: 2.251949E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.455 | TFLOPs: 31.09 | 7: iteration 69740/ 115203 | consumed samples: 17853440 | consumed tokens: 36563845120 | elapsed time per iteration (s): 0.42 | learning rate: 8.182E-05 | global batch size: 256 | lm loss: 2.252365E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.302 | TFLOPs: 31.71 | 7: iteration 69750/ 115203 | consumed samples: 17856000 | consumed tokens: 36569088000 | elapsed time per iteration (s): 0.43 | learning rate: 8.180E-05 | global batch size: 256 | lm loss: 2.262319E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.337 | TFLOPs: 31.24 | 7: iteration 69760/ 115203 | consumed samples: 17858560 | consumed tokens: 36574330880 | elapsed time per iteration (s): 0.44 | learning rate: 8.177E-05 | global batch size: 256 | lm loss: 2.229692E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.887 | TFLOPs: 30.74 | 7: iteration 69770/ 115203 | consumed samples: 17861120 | consumed tokens: 36579573760 | elapsed time per iteration (s): 0.42 | learning rate: 8.175E-05 | global batch size: 256 | lm loss: 2.239669E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.777 | TFLOPs: 31.84 | 7: iteration 69780/ 115203 | consumed samples: 17863680 | consumed tokens: 36584816640 | elapsed time per iteration (s): 0.44 | learning rate: 8.172E-05 | global batch size: 256 | lm loss: 2.292783E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.065 | TFLOPs: 30.23 | 7: iteration 69790/ 115203 | consumed samples: 17866240 | consumed tokens: 36590059520 | elapsed time per iteration (s): 0.44 | learning rate: 8.170E-05 | global batch size: 256 | lm loss: 2.270964E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.349 | TFLOPs: 30.29 | 7: iteration 69800/ 115203 | consumed samples: 17868800 | consumed tokens: 36595302400 | elapsed time per iteration (s): 0.43 | learning rate: 8.168E-05 | global batch size: 256 | lm loss: 2.257090E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.822 | TFLOPs: 30.95 | 7: iteration 69810/ 115203 | consumed samples: 17871360 | consumed tokens: 36600545280 | elapsed time per iteration (s): 0.43 | learning rate: 8.165E-05 | global batch size: 256 | lm loss: 2.242984E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.166 | TFLOPs: 31.07 | 7: iteration 69820/ 115203 | consumed samples: 17873920 | consumed tokens: 36605788160 | elapsed time per iteration (s): 0.43 | learning rate: 8.163E-05 | global batch size: 256 | lm loss: 2.220054E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.687 | TFLOPs: 31.25 | 7: iteration 69830/ 115203 | consumed samples: 17876480 | consumed tokens: 36611031040 | elapsed time per iteration (s): 0.43 | learning rate: 8.161E-05 | global batch size: 256 | lm loss: 2.252897E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.043 | TFLOPs: 30.96 | 7: iteration 69840/ 115203 | consumed samples: 17879040 | consumed tokens: 36616273920 | elapsed time per iteration (s): 0.42 | learning rate: 8.158E-05 | global batch size: 256 | lm loss: 2.247722E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.771 | TFLOPs: 31.99 | 7: iteration 69850/ 115203 | consumed samples: 17881600 | consumed tokens: 36621516800 | elapsed time per iteration (s): 0.44 | learning rate: 8.156E-05 | global batch size: 256 | lm loss: 2.256666E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.606 | TFLOPs: 30.67 | 7: iteration 69860/ 115203 | consumed samples: 17884160 | consumed tokens: 36626759680 | elapsed time per iteration (s): 0.43 | learning rate: 8.154E-05 | global batch size: 256 | lm loss: 2.269840E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.314 | TFLOPs: 31.45 | 7: iteration 69870/ 115203 | consumed samples: 17886720 | consumed tokens: 36632002560 | elapsed time per iteration (s): 0.43 | learning rate: 8.151E-05 | global batch size: 256 | lm loss: 2.262980E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.737 | TFLOPs: 31.31 | 7: iteration 69880/ 115203 | consumed samples: 17889280 | consumed tokens: 36637245440 | elapsed time per iteration (s): 0.43 | learning rate: 8.149E-05 | global batch size: 256 | lm loss: 2.246298E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.205 | TFLOPs: 31.39 | 7: iteration 69890/ 115203 | consumed samples: 17891840 | consumed tokens: 36642488320 | elapsed time per iteration (s): 0.42 | learning rate: 8.147E-05 | global batch size: 256 | lm loss: 2.250167E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.875 | TFLOPs: 31.68 | 7: iteration 69900/ 115203 | consumed samples: 17894400 | consumed tokens: 36647731200 | elapsed time per iteration (s): 0.44 | learning rate: 8.144E-05 | global batch size: 256 | lm loss: 2.272661E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.891 | TFLOPs: 30.69 | 7: iteration 69910/ 115203 | consumed samples: 17896960 | consumed tokens: 36652974080 | elapsed time per iteration (s): 0.44 | learning rate: 8.142E-05 | global batch size: 256 | lm loss: 2.249640E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.837 | TFLOPs: 30.53 | 7: iteration 69920/ 115203 | consumed samples: 17899520 | consumed tokens: 36658216960 | elapsed time per iteration (s): 0.44 | learning rate: 8.140E-05 | global batch size: 256 | lm loss: 2.304393E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.501 | TFLOPs: 30.88 | 7: iteration 69930/ 115203 | consumed samples: 17902080 | consumed tokens: 36663459840 | elapsed time per iteration (s): 0.42 | learning rate: 8.137E-05 | global batch size: 256 | lm loss: 2.273400E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.884 | TFLOPs: 31.74 | 7: iteration 69940/ 115203 | consumed samples: 17904640 | consumed tokens: 36668702720 | elapsed time per iteration (s): 0.43 | learning rate: 8.135E-05 | global batch size: 256 | lm loss: 2.271737E+00 | grad norm: 0.209 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.020 | TFLOPs: 30.90 | 7: iteration 69950/ 115203 | consumed samples: 17907200 | consumed tokens: 36673945600 | elapsed time per iteration (s): 0.42 | learning rate: 8.132E-05 | global batch size: 256 | lm loss: 2.247617E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.854 | TFLOPs: 31.89 | 7: iteration 69960/ 115203 | consumed samples: 17909760 | consumed tokens: 36679188480 | elapsed time per iteration (s): 0.42 | learning rate: 8.130E-05 | global batch size: 256 | lm loss: 2.257051E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.981 | TFLOPs: 31.69 | 7: iteration 69970/ 115203 | consumed samples: 17912320 | consumed tokens: 36684431360 | elapsed time per iteration (s): 0.43 | learning rate: 8.128E-05 | global batch size: 256 | lm loss: 2.274773E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.029 | TFLOPs: 31.22 | 7: iteration 69980/ 115203 | consumed samples: 17914880 | consumed tokens: 36689674240 | elapsed time per iteration (s): 0.42 | learning rate: 8.125E-05 | global batch size: 256 | lm loss: 2.268594E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.703 | TFLOPs: 31.73 | 7: iteration 69990/ 115203 | consumed samples: 17917440 | consumed tokens: 36694917120 | elapsed time per iteration (s): 0.44 | learning rate: 8.123E-05 | global batch size: 256 | lm loss: 2.254684E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.366 | TFLOPs: 30.56 | 0: [2022-11-28 21:22:45,489] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=0, lr=[8.120745619091417e-05, 8.120745619091417e-05, 8.120745619091417e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 70000/ 115203 | consumed samples: 17920000 | consumed tokens: 36700160000 | elapsed time per iteration (s): 0.43 | learning rate: 8.121E-05 | global batch size: 256 | lm loss: 2.272629E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.069 | TFLOPs: 31.48 | 0: steps: 70000 loss: 2.2736 iter time (s): 0.429 samples/sec: 596.413 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 70000 | lm loss value: 2.246294E+00 | lm loss PPL: 9.452644E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 70000 to checkpoints_221m 0: [2022-11-28 21:22:45,697] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step70000 is begin to save! 0: [2022-11-28 21:22:45,725] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:22:45,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:22:45,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:22:45,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:22:45,849] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:22:45,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:22:45,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:22:45,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:22:45,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:22:45,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:22:45,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:22:45,943] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:22:45,943] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:22:45,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:22:45,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:22:45,992] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:22:45,992] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:22:46,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:22:46,015] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:22:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:22:46,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:22:46,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:22:46,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:22:46,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:22:46,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:22:46,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:22:46,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:22:46,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:22:46,135] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:22:46,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:22:46,158] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:22:46,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:22:46,183] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:22:46,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:22:46,206] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:22:46,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:22:46,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:22:46,253] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:22:46,253] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:22:46,258] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:22:46,258] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step70000/mp_rank_00_model_states.pt 0: [2022-11-28 21:22:46,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:22:46,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:22:46,279] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:22:46,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,331] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,331] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:22:46,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 7: [2022-11-28 21:22:46,347] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,347] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,347] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,332] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,332] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,333] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,333] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,333] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 3: [2022-11-28 21:22:46,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:22:46,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:22:46,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:22:46,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 4: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 2: [2022-11-28 21:22:46,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:22:46,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:22:46,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,356] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 1: [2022-11-28 21:22:46,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:22:46,357] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:22:46,357] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:22:46,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 6: [2022-11-28 21:22:46,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,362] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,362] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,365] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,365] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,365] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:22:46,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:22:46,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: [2022-11-28 21:22:46,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:22:46,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step70000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 5: [2022-11-28 21:22:46,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step70000 is ready now! 0: successfully saved checkpoint at iteration 70000 to checkpoints_221m 7: time (ms) | save-checkpoint: 758.40 7: iteration 70010/ 115203 | consumed samples: 17922560 | consumed tokens: 36705402880 | elapsed time per iteration (s): 0.53 | learning rate: 8.118E-05 | global batch size: 256 | lm loss: 2.266032E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 480.057 | TFLOPs: 25.19 | 7: iteration 70020/ 115203 | consumed samples: 17925120 | consumed tokens: 36710645760 | elapsed time per iteration (s): 0.42 | learning rate: 8.116E-05 | global batch size: 256 | lm loss: 2.239446E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.414 | TFLOPs: 31.61 | 7: iteration 70030/ 115203 | consumed samples: 17927680 | consumed tokens: 36715888640 | elapsed time per iteration (s): 0.42 | learning rate: 8.114E-05 | global batch size: 256 | lm loss: 2.288623E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.156 | TFLOPs: 31.80 | 7: iteration 70040/ 115203 | consumed samples: 17930240 | consumed tokens: 36721131520 | elapsed time per iteration (s): 0.43 | learning rate: 8.111E-05 | global batch size: 256 | lm loss: 2.244939E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.916 | TFLOPs: 31.32 | 7: iteration 70050/ 115203 | consumed samples: 17932800 | consumed tokens: 36726374400 | elapsed time per iteration (s): 0.44 | learning rate: 8.109E-05 | global batch size: 256 | lm loss: 2.236610E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.290 | TFLOPs: 30.76 | 7: iteration 70060/ 115203 | consumed samples: 17935360 | consumed tokens: 36731617280 | elapsed time per iteration (s): 0.43 | learning rate: 8.107E-05 | global batch size: 256 | lm loss: 2.216254E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.184 | TFLOPs: 31.12 | 7: iteration 70070/ 115203 | consumed samples: 17937920 | consumed tokens: 36736860160 | elapsed time per iteration (s): 0.42 | learning rate: 8.104E-05 | global batch size: 256 | lm loss: 2.280735E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.246 | TFLOPs: 31.70 | 7: iteration 70080/ 115203 | consumed samples: 17940480 | consumed tokens: 36742103040 | elapsed time per iteration (s): 0.42 | learning rate: 8.102E-05 | global batch size: 256 | lm loss: 2.281441E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.797 | TFLOPs: 31.63 | 7: iteration 70090/ 115203 | consumed samples: 17943040 | consumed tokens: 36747345920 | elapsed time per iteration (s): 0.44 | learning rate: 8.100E-05 | global batch size: 256 | lm loss: 2.259178E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.449 | TFLOPs: 30.72 | 7: iteration 70100/ 115203 | consumed samples: 17945600 | consumed tokens: 36752588800 | elapsed time per iteration (s): 0.43 | learning rate: 8.097E-05 | global batch size: 256 | lm loss: 2.262763E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.543 | TFLOPs: 31.04 | 7: iteration 70110/ 115203 | consumed samples: 17948160 | consumed tokens: 36757831680 | elapsed time per iteration (s): 0.46 | learning rate: 8.095E-05 | global batch size: 256 | lm loss: 2.245545E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.668 | TFLOPs: 29.42 | 7: iteration 70120/ 115203 | consumed samples: 17950720 | consumed tokens: 36763074560 | elapsed time per iteration (s): 0.43 | learning rate: 8.093E-05 | global batch size: 256 | lm loss: 2.249335E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.181 | TFLOPs: 31.28 | 7: iteration 70130/ 115203 | consumed samples: 17953280 | consumed tokens: 36768317440 | elapsed time per iteration (s): 0.44 | learning rate: 8.090E-05 | global batch size: 256 | lm loss: 2.291208E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.359 | TFLOPs: 30.56 | 7: iteration 70140/ 115203 | consumed samples: 17955840 | consumed tokens: 36773560320 | elapsed time per iteration (s): 0.43 | learning rate: 8.088E-05 | global batch size: 256 | lm loss: 2.260113E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.542 | TFLOPs: 31.14 | 7: iteration 70150/ 115203 | consumed samples: 17958400 | consumed tokens: 36778803200 | elapsed time per iteration (s): 0.43 | learning rate: 8.086E-05 | global batch size: 256 | lm loss: 2.279980E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.201 | TFLOPs: 31.07 | 7: iteration 70160/ 115203 | consumed samples: 17960960 | consumed tokens: 36784046080 | elapsed time per iteration (s): 0.43 | learning rate: 8.083E-05 | global batch size: 256 | lm loss: 2.273057E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.243 | TFLOPs: 31.60 | 7: iteration 70170/ 115203 | consumed samples: 17963520 | consumed tokens: 36789288960 | elapsed time per iteration (s): 0.43 | learning rate: 8.081E-05 | global batch size: 256 | lm loss: 2.272175E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.998 | TFLOPs: 31.17 | 7: iteration 70180/ 115203 | consumed samples: 17966080 | consumed tokens: 36794531840 | elapsed time per iteration (s): 0.42 | learning rate: 8.079E-05 | global batch size: 256 | lm loss: 2.275129E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.094 | TFLOPs: 31.96 | 7: iteration 70190/ 115203 | consumed samples: 17968640 | consumed tokens: 36799774720 | elapsed time per iteration (s): 0.42 | learning rate: 8.076E-05 | global batch size: 256 | lm loss: 2.269015E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.053 | TFLOPs: 31.85 | 7: iteration 70200/ 115203 | consumed samples: 17971200 | consumed tokens: 36805017600 | elapsed time per iteration (s): 0.43 | learning rate: 8.074E-05 | global batch size: 256 | lm loss: 2.262969E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.256 | TFLOPs: 31.44 | 7: iteration 70210/ 115203 | consumed samples: 17973760 | consumed tokens: 36810260480 | elapsed time per iteration (s): 0.43 | learning rate: 8.071E-05 | global batch size: 256 | lm loss: 2.215055E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.047 | TFLOPs: 31.06 | 7: iteration 70220/ 115203 | consumed samples: 17976320 | consumed tokens: 36815503360 | elapsed time per iteration (s): 0.43 | learning rate: 8.069E-05 | global batch size: 256 | lm loss: 2.249865E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.311 | TFLOPs: 31.60 | 7: iteration 70230/ 115203 | consumed samples: 17978880 | consumed tokens: 36820746240 | elapsed time per iteration (s): 0.43 | learning rate: 8.067E-05 | global batch size: 256 | lm loss: 2.257419E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.046 | TFLOPs: 30.96 | 7: iteration 70240/ 115203 | consumed samples: 17981440 | consumed tokens: 36825989120 | elapsed time per iteration (s): 0.42 | learning rate: 8.064E-05 | global batch size: 256 | lm loss: 2.244090E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.712 | TFLOPs: 31.78 | 7: iteration 70250/ 115203 | consumed samples: 17984000 | consumed tokens: 36831232000 | elapsed time per iteration (s): 0.43 | learning rate: 8.062E-05 | global batch size: 256 | lm loss: 2.263603E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.224 | TFLOPs: 31.23 | 7: iteration 70260/ 115203 | consumed samples: 17986560 | consumed tokens: 36836474880 | elapsed time per iteration (s): 0.44 | learning rate: 8.060E-05 | global batch size: 256 | lm loss: 2.264975E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.767 | TFLOPs: 30.42 | 7: iteration 70270/ 115203 | consumed samples: 17989120 | consumed tokens: 36841717760 | elapsed time per iteration (s): 0.43 | learning rate: 8.057E-05 | global batch size: 256 | lm loss: 2.261922E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.496 | TFLOPs: 31.56 | 7: iteration 70280/ 115203 | consumed samples: 17991680 | consumed tokens: 36846960640 | elapsed time per iteration (s): 0.44 | learning rate: 8.055E-05 | global batch size: 256 | lm loss: 2.282760E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.938 | TFLOPs: 30.64 | 7: iteration 70290/ 115203 | consumed samples: 17994240 | consumed tokens: 36852203520 | elapsed time per iteration (s): 0.42 | learning rate: 8.053E-05 | global batch size: 256 | lm loss: 2.264085E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.971 | TFLOPs: 31.64 | 7: iteration 70300/ 115203 | consumed samples: 17996800 | consumed tokens: 36857446400 | elapsed time per iteration (s): 0.42 | learning rate: 8.050E-05 | global batch size: 256 | lm loss: 2.268641E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.168 | TFLOPs: 32.22 | 7: iteration 70310/ 115203 | consumed samples: 17999360 | consumed tokens: 36862689280 | elapsed time per iteration (s): 0.43 | learning rate: 8.048E-05 | global batch size: 256 | lm loss: 2.296335E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.948 | TFLOPs: 31.53 | 7: iteration 70320/ 115203 | consumed samples: 18001920 | consumed tokens: 36867932160 | elapsed time per iteration (s): 0.43 | learning rate: 8.046E-05 | global batch size: 256 | lm loss: 2.248130E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.736 | TFLOPs: 31.15 | 7: iteration 70330/ 115203 | consumed samples: 18004480 | consumed tokens: 36873175040 | elapsed time per iteration (s): 0.43 | learning rate: 8.043E-05 | global batch size: 256 | lm loss: 2.258916E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.085 | TFLOPs: 31.43 | 7: iteration 70340/ 115203 | consumed samples: 18007040 | consumed tokens: 36878417920 | elapsed time per iteration (s): 0.43 | learning rate: 8.041E-05 | global batch size: 256 | lm loss: 2.264014E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.264 | TFLOPs: 31.55 | 7: iteration 70350/ 115203 | consumed samples: 18009600 | consumed tokens: 36883660800 | elapsed time per iteration (s): 0.43 | learning rate: 8.039E-05 | global batch size: 256 | lm loss: 2.278620E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.192 | TFLOPs: 30.91 | 7: iteration 70360/ 115203 | consumed samples: 18012160 | consumed tokens: 36888903680 | elapsed time per iteration (s): 0.43 | learning rate: 8.036E-05 | global batch size: 256 | lm loss: 2.270972E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.824 | TFLOPs: 31.10 | 7: iteration 70370/ 115203 | consumed samples: 18014720 | consumed tokens: 36894146560 | elapsed time per iteration (s): 0.43 | learning rate: 8.034E-05 | global batch size: 256 | lm loss: 2.240273E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.252 | TFLOPs: 31.13 | 7: iteration 70380/ 115203 | consumed samples: 18017280 | consumed tokens: 36899389440 | elapsed time per iteration (s): 0.44 | learning rate: 8.032E-05 | global batch size: 256 | lm loss: 2.262498E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.430 | TFLOPs: 30.51 | 7: iteration 70390/ 115203 | consumed samples: 18019840 | consumed tokens: 36904632320 | elapsed time per iteration (s): 0.42 | learning rate: 8.029E-05 | global batch size: 256 | lm loss: 2.275060E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.163 | TFLOPs: 31.70 | 7: iteration 70400/ 115203 | consumed samples: 18022400 | consumed tokens: 36909875200 | elapsed time per iteration (s): 0.42 | learning rate: 8.027E-05 | global batch size: 256 | lm loss: 2.272879E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.963 | TFLOPs: 31.69 | 7: iteration 70410/ 115203 | consumed samples: 18024960 | consumed tokens: 36915118080 | elapsed time per iteration (s): 0.42 | learning rate: 8.025E-05 | global batch size: 256 | lm loss: 2.271211E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.047 | TFLOPs: 31.80 | 7: iteration 70420/ 115203 | consumed samples: 18027520 | consumed tokens: 36920360960 | elapsed time per iteration (s): 0.42 | learning rate: 8.022E-05 | global batch size: 256 | lm loss: 2.275253E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.485 | TFLOPs: 31.61 | 7: iteration 70430/ 115203 | consumed samples: 18030080 | consumed tokens: 36925603840 | elapsed time per iteration (s): 0.42 | learning rate: 8.020E-05 | global batch size: 256 | lm loss: 2.291763E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.281 | TFLOPs: 31.76 | 7: iteration 70440/ 115203 | consumed samples: 18032640 | consumed tokens: 36930846720 | elapsed time per iteration (s): 0.45 | learning rate: 8.018E-05 | global batch size: 256 | lm loss: 2.256902E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.393 | TFLOPs: 29.61 | 7: iteration 70450/ 115203 | consumed samples: 18035200 | consumed tokens: 36936089600 | elapsed time per iteration (s): 0.42 | learning rate: 8.015E-05 | global batch size: 256 | lm loss: 2.254896E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.736 | TFLOPs: 32.04 | 7: iteration 70460/ 115203 | consumed samples: 18037760 | consumed tokens: 36941332480 | elapsed time per iteration (s): 0.43 | learning rate: 8.013E-05 | global batch size: 256 | lm loss: 2.222762E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.834 | TFLOPs: 31.26 | 7: iteration 70470/ 115203 | consumed samples: 18040320 | consumed tokens: 36946575360 | elapsed time per iteration (s): 0.43 | learning rate: 8.011E-05 | global batch size: 256 | lm loss: 2.259083E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.313 | TFLOPs: 31.03 | 7: iteration 70480/ 115203 | consumed samples: 18042880 | consumed tokens: 36951818240 | elapsed time per iteration (s): 0.43 | learning rate: 8.008E-05 | global batch size: 256 | lm loss: 2.247469E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.854 | TFLOPs: 31.05 | 7: iteration 70490/ 115203 | consumed samples: 18045440 | consumed tokens: 36957061120 | elapsed time per iteration (s): 0.43 | learning rate: 8.006E-05 | global batch size: 256 | lm loss: 2.254323E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.820 | TFLOPs: 31.37 | 7: iteration 70500/ 115203 | consumed samples: 18048000 | consumed tokens: 36962304000 | elapsed time per iteration (s): 0.43 | learning rate: 8.004E-05 | global batch size: 256 | lm loss: 2.279174E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.597 | TFLOPs: 31.51 | 7: iteration 70510/ 115203 | consumed samples: 18050560 | consumed tokens: 36967546880 | elapsed time per iteration (s): 0.42 | learning rate: 8.001E-05 | global batch size: 256 | lm loss: 2.252020E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.427 | TFLOPs: 31.71 | 7: iteration 70520/ 115203 | consumed samples: 18053120 | consumed tokens: 36972789760 | elapsed time per iteration (s): 0.43 | learning rate: 7.999E-05 | global batch size: 256 | lm loss: 2.247479E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.661 | TFLOPs: 31.15 | 7: iteration 70530/ 115203 | consumed samples: 18055680 | consumed tokens: 36978032640 | elapsed time per iteration (s): 0.42 | learning rate: 7.997E-05 | global batch size: 256 | lm loss: 2.269375E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.777 | TFLOPs: 32.15 | 7: iteration 70540/ 115203 | consumed samples: 18058240 | consumed tokens: 36983275520 | elapsed time per iteration (s): 0.43 | learning rate: 7.994E-05 | global batch size: 256 | lm loss: 2.251523E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.273 | TFLOPs: 31.60 | 7: iteration 70550/ 115203 | consumed samples: 18060800 | consumed tokens: 36988518400 | elapsed time per iteration (s): 0.43 | learning rate: 7.992E-05 | global batch size: 256 | lm loss: 2.249701E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.327 | TFLOPs: 31.03 | 7: iteration 70560/ 115203 | consumed samples: 18063360 | consumed tokens: 36993761280 | elapsed time per iteration (s): 0.43 | learning rate: 7.990E-05 | global batch size: 256 | lm loss: 2.247353E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.736 | TFLOPs: 31.10 | 7: iteration 70570/ 115203 | consumed samples: 18065920 | consumed tokens: 36999004160 | elapsed time per iteration (s): 0.44 | learning rate: 7.987E-05 | global batch size: 256 | lm loss: 2.251822E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.230 | TFLOPs: 30.34 | 7: iteration 70580/ 115203 | consumed samples: 18068480 | consumed tokens: 37004247040 | elapsed time per iteration (s): 0.42 | learning rate: 7.985E-05 | global batch size: 256 | lm loss: 2.278574E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.249 | TFLOPs: 31.65 | 7: iteration 70590/ 115203 | consumed samples: 18071040 | consumed tokens: 37009489920 | elapsed time per iteration (s): 0.43 | learning rate: 7.983E-05 | global batch size: 256 | lm loss: 2.240366E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.056 | TFLOPs: 31.38 | 7: iteration 70600/ 115203 | consumed samples: 18073600 | consumed tokens: 37014732800 | elapsed time per iteration (s): 0.43 | learning rate: 7.980E-05 | global batch size: 256 | lm loss: 2.278135E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.814 | TFLOPs: 31.21 | 7: iteration 70610/ 115203 | consumed samples: 18076160 | consumed tokens: 37019975680 | elapsed time per iteration (s): 0.43 | learning rate: 7.978E-05 | global batch size: 256 | lm loss: 2.268884E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.414 | TFLOPs: 31.56 | 7: iteration 70620/ 115203 | consumed samples: 18078720 | consumed tokens: 37025218560 | elapsed time per iteration (s): 0.47 | learning rate: 7.976E-05 | global batch size: 256 | lm loss: 2.256927E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.808 | TFLOPs: 28.69 | 7: iteration 70630/ 115203 | consumed samples: 18081280 | consumed tokens: 37030461440 | elapsed time per iteration (s): 0.42 | learning rate: 7.973E-05 | global batch size: 256 | lm loss: 2.268891E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.009 | TFLOPs: 31.80 | 7: iteration 70640/ 115203 | consumed samples: 18083840 | consumed tokens: 37035704320 | elapsed time per iteration (s): 0.43 | learning rate: 7.971E-05 | global batch size: 256 | lm loss: 2.276796E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.520 | TFLOPs: 31.40 | 7: iteration 70650/ 115203 | consumed samples: 18086400 | consumed tokens: 37040947200 | elapsed time per iteration (s): 0.42 | learning rate: 7.969E-05 | global batch size: 256 | lm loss: 2.270492E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.665 | TFLOPs: 31.73 | 7: iteration 70660/ 115203 | consumed samples: 18088960 | consumed tokens: 37046190080 | elapsed time per iteration (s): 0.42 | learning rate: 7.966E-05 | global batch size: 256 | lm loss: 2.291176E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.326 | TFLOPs: 31.71 | 7: iteration 70670/ 115203 | consumed samples: 18091520 | consumed tokens: 37051432960 | elapsed time per iteration (s): 0.42 | learning rate: 7.964E-05 | global batch size: 256 | lm loss: 2.262910E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.154 | TFLOPs: 31.96 | 7: iteration 70680/ 115203 | consumed samples: 18094080 | consumed tokens: 37056675840 | elapsed time per iteration (s): 0.43 | learning rate: 7.962E-05 | global batch size: 256 | lm loss: 2.283018E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.534 | TFLOPs: 30.98 | 7: iteration 70690/ 115203 | consumed samples: 18096640 | consumed tokens: 37061918720 | elapsed time per iteration (s): 0.43 | learning rate: 7.959E-05 | global batch size: 256 | lm loss: 2.250572E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.386 | TFLOPs: 31.50 | 7: iteration 70700/ 115203 | consumed samples: 18099200 | consumed tokens: 37067161600 | elapsed time per iteration (s): 0.43 | learning rate: 7.957E-05 | global batch size: 256 | lm loss: 2.292812E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.458 | TFLOPs: 31.14 | 7: iteration 70710/ 115203 | consumed samples: 18101760 | consumed tokens: 37072404480 | elapsed time per iteration (s): 0.43 | learning rate: 7.955E-05 | global batch size: 256 | lm loss: 2.260735E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.166 | TFLOPs: 31.54 | 7: iteration 70720/ 115203 | consumed samples: 18104320 | consumed tokens: 37077647360 | elapsed time per iteration (s): 0.42 | learning rate: 7.952E-05 | global batch size: 256 | lm loss: 2.254082E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.175 | TFLOPs: 31.70 | 7: iteration 70730/ 115203 | consumed samples: 18106880 | consumed tokens: 37082890240 | elapsed time per iteration (s): 0.43 | learning rate: 7.950E-05 | global batch size: 256 | lm loss: 2.256449E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.642 | TFLOPs: 31.36 | 7: iteration 70740/ 115203 | consumed samples: 18109440 | consumed tokens: 37088133120 | elapsed time per iteration (s): 0.42 | learning rate: 7.948E-05 | global batch size: 256 | lm loss: 2.250027E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.241 | TFLOPs: 31.65 | 7: iteration 70750/ 115203 | consumed samples: 18112000 | consumed tokens: 37093376000 | elapsed time per iteration (s): 0.42 | learning rate: 7.945E-05 | global batch size: 256 | lm loss: 2.228215E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.495 | TFLOPs: 31.61 | 7: iteration 70760/ 115203 | consumed samples: 18114560 | consumed tokens: 37098618880 | elapsed time per iteration (s): 0.43 | learning rate: 7.943E-05 | global batch size: 256 | lm loss: 2.242595E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.246 | TFLOPs: 31.07 | 7: iteration 70770/ 115203 | consumed samples: 18117120 | consumed tokens: 37103861760 | elapsed time per iteration (s): 0.42 | learning rate: 7.941E-05 | global batch size: 256 | lm loss: 2.250963E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.696 | TFLOPs: 31.78 | 7: iteration 70780/ 115203 | consumed samples: 18119680 | consumed tokens: 37109104640 | elapsed time per iteration (s): 0.44 | learning rate: 7.938E-05 | global batch size: 256 | lm loss: 2.268888E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.356 | TFLOPs: 30.77 | 7: iteration 70790/ 115203 | consumed samples: 18122240 | consumed tokens: 37114347520 | elapsed time per iteration (s): 0.43 | learning rate: 7.936E-05 | global batch size: 256 | lm loss: 2.260316E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.832 | TFLOPs: 31.10 | 7: iteration 70800/ 115203 | consumed samples: 18124800 | consumed tokens: 37119590400 | elapsed time per iteration (s): 0.43 | learning rate: 7.934E-05 | global batch size: 256 | lm loss: 2.253141E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.806 | TFLOPs: 31.16 | 7: iteration 70810/ 115203 | consumed samples: 18127360 | consumed tokens: 37124833280 | elapsed time per iteration (s): 0.43 | learning rate: 7.931E-05 | global batch size: 256 | lm loss: 2.275533E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.931 | TFLOPs: 31.22 | 7: iteration 70820/ 115203 | consumed samples: 18129920 | consumed tokens: 37130076160 | elapsed time per iteration (s): 0.43 | learning rate: 7.929E-05 | global batch size: 256 | lm loss: 2.298310E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.819 | TFLOPs: 31.31 | 7: iteration 70830/ 115203 | consumed samples: 18132480 | consumed tokens: 37135319040 | elapsed time per iteration (s): 0.43 | learning rate: 7.927E-05 | global batch size: 256 | lm loss: 2.282681E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.781 | TFLOPs: 31.15 | 7: iteration 70840/ 115203 | consumed samples: 18135040 | consumed tokens: 37140561920 | elapsed time per iteration (s): 0.43 | learning rate: 7.924E-05 | global batch size: 256 | lm loss: 2.253494E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.218 | TFLOPs: 31.44 | 7: iteration 70850/ 115203 | consumed samples: 18137600 | consumed tokens: 37145804800 | elapsed time per iteration (s): 0.43 | learning rate: 7.922E-05 | global batch size: 256 | lm loss: 2.263075E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.481 | TFLOPs: 31.30 | 7: iteration 70860/ 115203 | consumed samples: 18140160 | consumed tokens: 37151047680 | elapsed time per iteration (s): 0.42 | learning rate: 7.920E-05 | global batch size: 256 | lm loss: 2.266820E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.829 | TFLOPs: 32.15 | 7: iteration 70870/ 115203 | consumed samples: 18142720 | consumed tokens: 37156290560 | elapsed time per iteration (s): 0.42 | learning rate: 7.917E-05 | global batch size: 256 | lm loss: 2.285120E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.214 | TFLOPs: 31.65 | 7: iteration 70880/ 115203 | consumed samples: 18145280 | consumed tokens: 37161533440 | elapsed time per iteration (s): 0.42 | learning rate: 7.915E-05 | global batch size: 256 | lm loss: 2.233291E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.912 | TFLOPs: 32.11 | 7: iteration 70890/ 115203 | consumed samples: 18147840 | consumed tokens: 37166776320 | elapsed time per iteration (s): 0.42 | learning rate: 7.913E-05 | global batch size: 256 | lm loss: 2.299025E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.875 | TFLOPs: 31.95 | 7: iteration 70900/ 115203 | consumed samples: 18150400 | consumed tokens: 37172019200 | elapsed time per iteration (s): 0.43 | learning rate: 7.910E-05 | global batch size: 256 | lm loss: 2.238273E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.637 | TFLOPs: 31.41 | 7: iteration 70910/ 115203 | consumed samples: 18152960 | consumed tokens: 37177262080 | elapsed time per iteration (s): 0.43 | learning rate: 7.908E-05 | global batch size: 256 | lm loss: 2.248608E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.424 | TFLOPs: 30.93 | 7: iteration 70920/ 115203 | consumed samples: 18155520 | consumed tokens: 37182504960 | elapsed time per iteration (s): 0.42 | learning rate: 7.906E-05 | global batch size: 256 | lm loss: 2.276122E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.563 | TFLOPs: 31.77 | 7: iteration 70930/ 115203 | consumed samples: 18158080 | consumed tokens: 37187747840 | elapsed time per iteration (s): 0.43 | learning rate: 7.903E-05 | global batch size: 256 | lm loss: 2.246699E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.585 | TFLOPs: 31.41 | 7: iteration 70940/ 115203 | consumed samples: 18160640 | consumed tokens: 37192990720 | elapsed time per iteration (s): 0.42 | learning rate: 7.901E-05 | global batch size: 256 | lm loss: 2.265419E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.738 | TFLOPs: 31.68 | 7: iteration 70950/ 115203 | consumed samples: 18163200 | consumed tokens: 37198233600 | elapsed time per iteration (s): 0.42 | learning rate: 7.899E-05 | global batch size: 256 | lm loss: 2.277301E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.721 | TFLOPs: 31.68 | 7: iteration 70960/ 115203 | consumed samples: 18165760 | consumed tokens: 37203476480 | elapsed time per iteration (s): 0.42 | learning rate: 7.896E-05 | global batch size: 256 | lm loss: 2.272078E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.339 | TFLOPs: 31.87 | 7: iteration 70970/ 115203 | consumed samples: 18168320 | consumed tokens: 37208719360 | elapsed time per iteration (s): 0.43 | learning rate: 7.894E-05 | global batch size: 256 | lm loss: 2.284159E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.677 | TFLOPs: 31.52 | 7: iteration 70980/ 115203 | consumed samples: 18170880 | consumed tokens: 37213962240 | elapsed time per iteration (s): 0.43 | learning rate: 7.892E-05 | global batch size: 256 | lm loss: 2.260079E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.020 | TFLOPs: 31.17 | 7: iteration 70990/ 115203 | consumed samples: 18173440 | consumed tokens: 37219205120 | elapsed time per iteration (s): 0.42 | learning rate: 7.889E-05 | global batch size: 256 | lm loss: 2.230049E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.935 | TFLOPs: 31.84 | 7: iteration 71000/ 115203 | consumed samples: 18176000 | consumed tokens: 37224448000 | elapsed time per iteration (s): 0.42 | learning rate: 7.887E-05 | global batch size: 256 | lm loss: 2.275922E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.911 | TFLOPs: 31.79 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 71000 | lm loss value: 2.186811E+00 | lm loss PPL: 8.906768E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 71000 to checkpoints_221m 0: [2022-11-28 21:29:55,377] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step71000 is begin to save! 0: [2022-11-28 21:29:55,381] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:29:55,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:29:55,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:29:55,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:29:55,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:29:55,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:29:55,525] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:29:55,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:29:55,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:29:55,572] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:29:55,572] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:29:55,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:29:55,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:29:55,618] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:29:55,619] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:29:55,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:29:55,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:29:55,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:29:55,667] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:29:55,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:29:55,690] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:29:55,712] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:29:55,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:29:55,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:29:55,736] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:29:55,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:29:55,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:29:55,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:29:55,781] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:29:55,804] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:29:55,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:29:55,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:29:55,828] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:29:55,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:29:55,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:29:55,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:29:55,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:29:55,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:29:55,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:29:55,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:29:55,903] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step71000/mp_rank_00_model_states.pt 0: [2022-11-28 21:29:55,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:29:55,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:29:55,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:29:55,968] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,974] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,975] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,975] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:55,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,973] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,973] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,977] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,977] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,977] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,978] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,978] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:55,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 5: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,980] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,980] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,981] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:29:55,981] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,981] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,984] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,984] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 4: [2022-11-28 21:29:55,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:29:55,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:29:55,985] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 1: [2022-11-28 21:29:55,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:29:55,985] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:29:55,986] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,988] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,988] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:29:55,991] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:29:55,991] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,989] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 3: [2022-11-28 21:29:55,989] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:29:55,989] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 6: [2022-11-28 21:29:55,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:29:55,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:29:55,996] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,003] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,003] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:29:56,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:29:56,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 7: [2022-11-28 21:29:56,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:56,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:56,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:29:56,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:56,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:56,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:56,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: [2022-11-28 21:29:56,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step71000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:29:56,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step71000 is ready now! 0: successfully saved checkpoint at iteration 71000 to checkpoints_221m 7: time (ms) | save-checkpoint: 657.98 7: iteration 71010/ 115203 | consumed samples: 18178560 | consumed tokens: 37229690880 | elapsed time per iteration (s): 0.51 | learning rate: 7.885E-05 | global batch size: 256 | lm loss: 2.242340E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 500.764 | TFLOPs: 26.27 | 7: iteration 71020/ 115203 | consumed samples: 18181120 | consumed tokens: 37234933760 | elapsed time per iteration (s): 0.42 | learning rate: 7.882E-05 | global batch size: 256 | lm loss: 2.258456E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.062 | TFLOPs: 31.85 | 7: iteration 71030/ 115203 | consumed samples: 18183680 | consumed tokens: 37240176640 | elapsed time per iteration (s): 0.43 | learning rate: 7.880E-05 | global batch size: 256 | lm loss: 2.266682E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.345 | TFLOPs: 31.24 | 7: iteration 71040/ 115203 | consumed samples: 18186240 | consumed tokens: 37245419520 | elapsed time per iteration (s): 0.44 | learning rate: 7.878E-05 | global batch size: 256 | lm loss: 2.278401E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.981 | TFLOPs: 30.27 | 7: iteration 71050/ 115203 | consumed samples: 18188800 | consumed tokens: 37250662400 | elapsed time per iteration (s): 0.64 | learning rate: 7.875E-05 | global batch size: 256 | lm loss: 2.249086E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 399.546 | TFLOPs: 20.96 | 7: iteration 71060/ 115203 | consumed samples: 18191360 | consumed tokens: 37255905280 | elapsed time per iteration (s): 0.42 | learning rate: 7.873E-05 | global batch size: 256 | lm loss: 2.228959E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.951 | TFLOPs: 32.00 | 7: iteration 71070/ 115203 | consumed samples: 18193920 | consumed tokens: 37261148160 | elapsed time per iteration (s): 0.45 | learning rate: 7.871E-05 | global batch size: 256 | lm loss: 2.270459E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.098 | TFLOPs: 29.65 | 7: iteration 71080/ 115203 | consumed samples: 18196480 | consumed tokens: 37266391040 | elapsed time per iteration (s): 0.44 | learning rate: 7.868E-05 | global batch size: 256 | lm loss: 2.271022E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.483 | TFLOPs: 30.61 | 7: iteration 71090/ 115203 | consumed samples: 18199040 | consumed tokens: 37271633920 | elapsed time per iteration (s): 0.42 | learning rate: 7.866E-05 | global batch size: 256 | lm loss: 2.253995E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.273 | TFLOPs: 31.76 | 7: iteration 71100/ 115203 | consumed samples: 18201600 | consumed tokens: 37276876800 | elapsed time per iteration (s): 0.43 | learning rate: 7.864E-05 | global batch size: 256 | lm loss: 2.261687E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.815 | TFLOPs: 31.05 | 7: iteration 71110/ 115203 | consumed samples: 18204160 | consumed tokens: 37282119680 | elapsed time per iteration (s): 0.42 | learning rate: 7.861E-05 | global batch size: 256 | lm loss: 2.269110E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.281 | TFLOPs: 31.65 | 7: iteration 71120/ 115203 | consumed samples: 18206720 | consumed tokens: 37287362560 | elapsed time per iteration (s): 0.42 | learning rate: 7.859E-05 | global batch size: 256 | lm loss: 2.258270E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.381 | TFLOPs: 32.03 | 7: iteration 71130/ 115203 | consumed samples: 18209280 | consumed tokens: 37292605440 | elapsed time per iteration (s): 0.43 | learning rate: 7.857E-05 | global batch size: 256 | lm loss: 2.248467E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.345 | TFLOPs: 31.60 | 7: iteration 71140/ 115203 | consumed samples: 18211840 | consumed tokens: 37297848320 | elapsed time per iteration (s): 0.44 | learning rate: 7.854E-05 | global batch size: 256 | lm loss: 2.249112E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.511 | TFLOPs: 30.56 | 7: iteration 71150/ 115203 | consumed samples: 18214400 | consumed tokens: 37303091200 | elapsed time per iteration (s): 0.43 | learning rate: 7.852E-05 | global batch size: 256 | lm loss: 2.261218E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.085 | TFLOPs: 31.59 | 7: iteration 71160/ 115203 | consumed samples: 18216960 | consumed tokens: 37308334080 | elapsed time per iteration (s): 0.44 | learning rate: 7.850E-05 | global batch size: 256 | lm loss: 2.275261E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.743 | TFLOPs: 30.79 | 7: iteration 71170/ 115203 | consumed samples: 18219520 | consumed tokens: 37313576960 | elapsed time per iteration (s): 0.43 | learning rate: 7.847E-05 | global batch size: 256 | lm loss: 2.244229E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.151 | TFLOPs: 31.49 | 7: iteration 71180/ 115203 | consumed samples: 18222080 | consumed tokens: 37318819840 | elapsed time per iteration (s): 0.43 | learning rate: 7.845E-05 | global batch size: 256 | lm loss: 2.302082E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.021 | TFLOPs: 31.22 | 7: iteration 71190/ 115203 | consumed samples: 18224640 | consumed tokens: 37324062720 | elapsed time per iteration (s): 0.42 | learning rate: 7.843E-05 | global batch size: 256 | lm loss: 2.262210E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.450 | TFLOPs: 31.61 | 7: iteration 71200/ 115203 | consumed samples: 18227200 | consumed tokens: 37329305600 | elapsed time per iteration (s): 0.42 | learning rate: 7.841E-05 | global batch size: 256 | lm loss: 2.260151E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.182 | TFLOPs: 31.75 | 7: iteration 71210/ 115203 | consumed samples: 18229760 | consumed tokens: 37334548480 | elapsed time per iteration (s): 0.43 | learning rate: 7.838E-05 | global batch size: 256 | lm loss: 2.268304E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.698 | TFLOPs: 31.15 | 7: iteration 71220/ 115203 | consumed samples: 18232320 | consumed tokens: 37339791360 | elapsed time per iteration (s): 0.42 | learning rate: 7.836E-05 | global batch size: 256 | lm loss: 2.254249E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.739 | TFLOPs: 31.83 | 7: iteration 71230/ 115203 | consumed samples: 18234880 | consumed tokens: 37345034240 | elapsed time per iteration (s): 0.43 | learning rate: 7.834E-05 | global batch size: 256 | lm loss: 2.282352E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.259 | TFLOPs: 31.49 | 7: iteration 71240/ 115203 | consumed samples: 18237440 | consumed tokens: 37350277120 | elapsed time per iteration (s): 0.43 | learning rate: 7.831E-05 | global batch size: 256 | lm loss: 2.248641E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.847 | TFLOPs: 31.47 | 7: iteration 71250/ 115203 | consumed samples: 18240000 | consumed tokens: 37355520000 | elapsed time per iteration (s): 0.43 | learning rate: 7.829E-05 | global batch size: 256 | lm loss: 2.265998E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.986 | TFLOPs: 31.48 | 7: iteration 71260/ 115203 | consumed samples: 18242560 | consumed tokens: 37360762880 | elapsed time per iteration (s): 0.43 | learning rate: 7.827E-05 | global batch size: 256 | lm loss: 2.249541E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.974 | TFLOPs: 31.27 | 7: iteration 71270/ 115203 | consumed samples: 18245120 | consumed tokens: 37366005760 | elapsed time per iteration (s): 0.42 | learning rate: 7.824E-05 | global batch size: 256 | lm loss: 2.250630E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.587 | TFLOPs: 31.67 | 7: iteration 71280/ 115203 | consumed samples: 18247680 | consumed tokens: 37371248640 | elapsed time per iteration (s): 0.42 | learning rate: 7.822E-05 | global batch size: 256 | lm loss: 2.225836E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.625 | TFLOPs: 31.62 | 7: iteration 71290/ 115203 | consumed samples: 18250240 | consumed tokens: 37376491520 | elapsed time per iteration (s): 0.43 | learning rate: 7.820E-05 | global batch size: 256 | lm loss: 2.255216E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.758 | TFLOPs: 31.26 | 7: iteration 71300/ 115203 | consumed samples: 18252800 | consumed tokens: 37381734400 | elapsed time per iteration (s): 0.42 | learning rate: 7.817E-05 | global batch size: 256 | lm loss: 2.275250E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.686 | TFLOPs: 31.62 | 7: iteration 71310/ 115203 | consumed samples: 18255360 | consumed tokens: 37386977280 | elapsed time per iteration (s): 0.42 | learning rate: 7.815E-05 | global batch size: 256 | lm loss: 2.264300E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.969 | TFLOPs: 32.21 | 7: iteration 71320/ 115203 | consumed samples: 18257920 | consumed tokens: 37392220160 | elapsed time per iteration (s): 0.43 | learning rate: 7.813E-05 | global batch size: 256 | lm loss: 2.261710E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.404 | TFLOPs: 31.08 | 7: iteration 71330/ 115203 | consumed samples: 18260480 | consumed tokens: 37397463040 | elapsed time per iteration (s): 0.42 | learning rate: 7.810E-05 | global batch size: 256 | lm loss: 2.282236E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.810 | TFLOPs: 31.63 | 7: iteration 71340/ 115203 | consumed samples: 18263040 | consumed tokens: 37402705920 | elapsed time per iteration (s): 0.43 | learning rate: 7.808E-05 | global batch size: 256 | lm loss: 2.267875E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.285 | TFLOPs: 31.44 | 7: iteration 71350/ 115203 | consumed samples: 18265600 | consumed tokens: 37407948800 | elapsed time per iteration (s): 0.43 | learning rate: 7.806E-05 | global batch size: 256 | lm loss: 2.264585E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.933 | TFLOPs: 31.11 | 7: iteration 71360/ 115203 | consumed samples: 18268160 | consumed tokens: 37413191680 | elapsed time per iteration (s): 0.42 | learning rate: 7.803E-05 | global batch size: 256 | lm loss: 2.253677E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.223 | TFLOPs: 31.76 | 7: iteration 71370/ 115203 | consumed samples: 18270720 | consumed tokens: 37418434560 | elapsed time per iteration (s): 0.42 | learning rate: 7.801E-05 | global batch size: 256 | lm loss: 2.255353E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.071 | TFLOPs: 31.75 | 7: iteration 71380/ 115203 | consumed samples: 18273280 | consumed tokens: 37423677440 | elapsed time per iteration (s): 0.42 | learning rate: 7.799E-05 | global batch size: 256 | lm loss: 2.248967E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.014 | TFLOPs: 31.95 | 7: iteration 71390/ 115203 | consumed samples: 18275840 | consumed tokens: 37428920320 | elapsed time per iteration (s): 0.43 | learning rate: 7.796E-05 | global batch size: 256 | lm loss: 2.248734E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.694 | TFLOPs: 31.46 | 7: iteration 71400/ 115203 | consumed samples: 18278400 | consumed tokens: 37434163200 | elapsed time per iteration (s): 0.42 | learning rate: 7.794E-05 | global batch size: 256 | lm loss: 2.276128E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.789 | TFLOPs: 31.94 | 7: iteration 71410/ 115203 | consumed samples: 18280960 | consumed tokens: 37439406080 | elapsed time per iteration (s): 0.43 | learning rate: 7.792E-05 | global batch size: 256 | lm loss: 2.223103E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.108 | TFLOPs: 31.54 | 7: iteration 71420/ 115203 | consumed samples: 18283520 | consumed tokens: 37444648960 | elapsed time per iteration (s): 0.42 | learning rate: 7.790E-05 | global batch size: 256 | lm loss: 2.251626E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.348 | TFLOPs: 32.08 | 7: iteration 71430/ 115203 | consumed samples: 18286080 | consumed tokens: 37449891840 | elapsed time per iteration (s): 0.43 | learning rate: 7.787E-05 | global batch size: 256 | lm loss: 2.264827E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.202 | TFLOPs: 31.44 | 7: iteration 71440/ 115203 | consumed samples: 18288640 | consumed tokens: 37455134720 | elapsed time per iteration (s): 0.44 | learning rate: 7.785E-05 | global batch size: 256 | lm loss: 2.234164E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.263 | TFLOPs: 30.81 | 7: iteration 71450/ 115203 | consumed samples: 18291200 | consumed tokens: 37460377600 | elapsed time per iteration (s): 0.42 | learning rate: 7.783E-05 | global batch size: 256 | lm loss: 2.248861E+00 | grad norm: 0.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.800 | TFLOPs: 31.63 | 7: iteration 71460/ 115203 | consumed samples: 18293760 | consumed tokens: 37465620480 | elapsed time per iteration (s): 0.43 | learning rate: 7.780E-05 | global batch size: 256 | lm loss: 2.253549E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.791 | TFLOPs: 31.47 | 7: iteration 71470/ 115203 | consumed samples: 18296320 | consumed tokens: 37470863360 | elapsed time per iteration (s): 0.43 | learning rate: 7.778E-05 | global batch size: 256 | lm loss: 2.270766E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.662 | TFLOPs: 31.57 | 7: iteration 71480/ 115203 | consumed samples: 18298880 | consumed tokens: 37476106240 | elapsed time per iteration (s): 0.42 | learning rate: 7.776E-05 | global batch size: 256 | lm loss: 2.259441E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.034 | TFLOPs: 31.80 | 7: iteration 71490/ 115203 | consumed samples: 18301440 | consumed tokens: 37481349120 | elapsed time per iteration (s): 0.44 | learning rate: 7.773E-05 | global batch size: 256 | lm loss: 2.265195E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.268 | TFLOPs: 30.24 | 7: iteration 71500/ 115203 | consumed samples: 18304000 | consumed tokens: 37486592000 | elapsed time per iteration (s): 0.43 | learning rate: 7.771E-05 | global batch size: 256 | lm loss: 2.265150E+00 | grad norm: 0.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.266 | TFLOPs: 31.18 | 7: iteration 71510/ 115203 | consumed samples: 18306560 | consumed tokens: 37491834880 | elapsed time per iteration (s): 0.43 | learning rate: 7.769E-05 | global batch size: 256 | lm loss: 2.245857E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.326 | TFLOPs: 31.18 | 7: iteration 71520/ 115203 | consumed samples: 18309120 | consumed tokens: 37497077760 | elapsed time per iteration (s): 0.42 | learning rate: 7.766E-05 | global batch size: 256 | lm loss: 2.229203E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.159 | TFLOPs: 31.70 | 7: iteration 71530/ 115203 | consumed samples: 18311680 | consumed tokens: 37502320640 | elapsed time per iteration (s): 0.43 | learning rate: 7.764E-05 | global batch size: 256 | lm loss: 2.239350E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.724 | TFLOPs: 31.20 | 7: iteration 71540/ 115203 | consumed samples: 18314240 | consumed tokens: 37507563520 | elapsed time per iteration (s): 0.43 | learning rate: 7.762E-05 | global batch size: 256 | lm loss: 2.251757E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.506 | TFLOPs: 31.56 | 7: iteration 71550/ 115203 | consumed samples: 18316800 | consumed tokens: 37512806400 | elapsed time per iteration (s): 0.43 | learning rate: 7.759E-05 | global batch size: 256 | lm loss: 2.236796E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.337 | TFLOPs: 31.55 | 7: iteration 71560/ 115203 | consumed samples: 18319360 | consumed tokens: 37518049280 | elapsed time per iteration (s): 0.42 | learning rate: 7.757E-05 | global batch size: 256 | lm loss: 2.262892E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.347 | TFLOPs: 31.81 | 7: iteration 71570/ 115203 | consumed samples: 18321920 | consumed tokens: 37523292160 | elapsed time per iteration (s): 0.94 | learning rate: 7.755E-05 | global batch size: 256 | lm loss: 2.261541E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 271.120 | TFLOPs: 14.23 | 7: iteration 71580/ 115203 | consumed samples: 18324480 | consumed tokens: 37528535040 | elapsed time per iteration (s): 0.62 | learning rate: 7.752E-05 | global batch size: 256 | lm loss: 2.266165E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 412.977 | TFLOPs: 21.67 | 7: iteration 71590/ 115203 | consumed samples: 18327040 | consumed tokens: 37533777920 | elapsed time per iteration (s): 0.86 | learning rate: 7.750E-05 | global batch size: 256 | lm loss: 2.263295E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 298.719 | TFLOPs: 15.67 | 7: iteration 71600/ 115203 | consumed samples: 18329600 | consumed tokens: 37539020800 | elapsed time per iteration (s): 0.43 | learning rate: 7.748E-05 | global batch size: 256 | lm loss: 2.267727E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.603 | TFLOPs: 30.99 | 7: iteration 71610/ 115203 | consumed samples: 18332160 | consumed tokens: 37544263680 | elapsed time per iteration (s): 0.45 | learning rate: 7.746E-05 | global batch size: 256 | lm loss: 2.240828E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.915 | TFLOPs: 30.06 | 7: iteration 71620/ 115203 | consumed samples: 18334720 | consumed tokens: 37549506560 | elapsed time per iteration (s): 0.43 | learning rate: 7.743E-05 | global batch size: 256 | lm loss: 2.253945E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.128 | TFLOPs: 31.59 | 7: iteration 71630/ 115203 | consumed samples: 18337280 | consumed tokens: 37554749440 | elapsed time per iteration (s): 0.44 | learning rate: 7.741E-05 | global batch size: 256 | lm loss: 2.300147E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.248 | TFLOPs: 30.50 | 7: iteration 71640/ 115203 | consumed samples: 18339840 | consumed tokens: 37559992320 | elapsed time per iteration (s): 0.43 | learning rate: 7.739E-05 | global batch size: 256 | lm loss: 2.257972E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.848 | TFLOPs: 31.47 | 7: iteration 71650/ 115203 | consumed samples: 18342400 | consumed tokens: 37565235200 | elapsed time per iteration (s): 0.42 | learning rate: 7.736E-05 | global batch size: 256 | lm loss: 2.280268E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.842 | TFLOPs: 31.63 | 7: iteration 71660/ 115203 | consumed samples: 18344960 | consumed tokens: 37570478080 | elapsed time per iteration (s): 0.43 | learning rate: 7.734E-05 | global batch size: 256 | lm loss: 2.270852E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.272 | TFLOPs: 31.44 | 7: iteration 71670/ 115203 | consumed samples: 18347520 | consumed tokens: 37575720960 | elapsed time per iteration (s): 0.43 | learning rate: 7.732E-05 | global batch size: 256 | lm loss: 2.263975E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.624 | TFLOPs: 31.41 | 7: iteration 71680/ 115203 | consumed samples: 18350080 | consumed tokens: 37580963840 | elapsed time per iteration (s): 0.43 | learning rate: 7.729E-05 | global batch size: 256 | lm loss: 2.268195E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.184 | TFLOPs: 30.97 | 7: iteration 71690/ 115203 | consumed samples: 18352640 | consumed tokens: 37586206720 | elapsed time per iteration (s): 0.45 | learning rate: 7.727E-05 | global batch size: 256 | lm loss: 2.263978E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.854 | TFLOPs: 30.06 | 7: iteration 71700/ 115203 | consumed samples: 18355200 | consumed tokens: 37591449600 | elapsed time per iteration (s): 0.44 | learning rate: 7.725E-05 | global batch size: 256 | lm loss: 2.249276E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.028 | TFLOPs: 30.70 | 7: iteration 71710/ 115203 | consumed samples: 18357760 | consumed tokens: 37596692480 | elapsed time per iteration (s): 0.43 | learning rate: 7.722E-05 | global batch size: 256 | lm loss: 2.275599E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.314 | TFLOPs: 31.45 | 7: iteration 71720/ 115203 | consumed samples: 18360320 | consumed tokens: 37601935360 | elapsed time per iteration (s): 0.44 | learning rate: 7.720E-05 | global batch size: 256 | lm loss: 2.257988E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.234 | TFLOPs: 30.86 | 7: iteration 71730/ 115203 | consumed samples: 18362880 | consumed tokens: 37607178240 | elapsed time per iteration (s): 0.43 | learning rate: 7.718E-05 | global batch size: 256 | lm loss: 2.288280E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.043 | TFLOPs: 31.06 | 7: iteration 71740/ 115203 | consumed samples: 18365440 | consumed tokens: 37612421120 | elapsed time per iteration (s): 0.44 | learning rate: 7.716E-05 | global batch size: 256 | lm loss: 2.246093E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.117 | TFLOPs: 30.60 | 7: iteration 71750/ 115203 | consumed samples: 18368000 | consumed tokens: 37617664000 | elapsed time per iteration (s): 0.43 | learning rate: 7.713E-05 | global batch size: 256 | lm loss: 2.237279E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.679 | TFLOPs: 31.15 | 7: iteration 71760/ 115203 | consumed samples: 18370560 | consumed tokens: 37622906880 | elapsed time per iteration (s): 0.44 | learning rate: 7.711E-05 | global batch size: 256 | lm loss: 2.238079E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.140 | TFLOPs: 30.60 | 7: iteration 71770/ 115203 | consumed samples: 18373120 | consumed tokens: 37628149760 | elapsed time per iteration (s): 0.44 | learning rate: 7.709E-05 | global batch size: 256 | lm loss: 2.265861E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.825 | TFLOPs: 30.53 | 7: iteration 71780/ 115203 | consumed samples: 18375680 | consumed tokens: 37633392640 | elapsed time per iteration (s): 0.44 | learning rate: 7.706E-05 | global batch size: 256 | lm loss: 2.279510E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.483 | TFLOPs: 30.25 | 7: iteration 71790/ 115203 | consumed samples: 18378240 | consumed tokens: 37638635520 | elapsed time per iteration (s): 0.44 | learning rate: 7.704E-05 | global batch size: 256 | lm loss: 2.239789E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.684 | TFLOPs: 30.68 | 7: iteration 71800/ 115203 | consumed samples: 18380800 | consumed tokens: 37643878400 | elapsed time per iteration (s): 0.43 | learning rate: 7.702E-05 | global batch size: 256 | lm loss: 2.253634E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.863 | TFLOPs: 30.95 | 7: iteration 71810/ 115203 | consumed samples: 18383360 | consumed tokens: 37649121280 | elapsed time per iteration (s): 0.43 | learning rate: 7.699E-05 | global batch size: 256 | lm loss: 2.234377E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.228 | TFLOPs: 31.18 | 7: iteration 71820/ 115203 | consumed samples: 18385920 | consumed tokens: 37654364160 | elapsed time per iteration (s): 0.44 | learning rate: 7.697E-05 | global batch size: 256 | lm loss: 2.233022E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.432 | TFLOPs: 30.87 | 7: iteration 71830/ 115203 | consumed samples: 18388480 | consumed tokens: 37659607040 | elapsed time per iteration (s): 0.44 | learning rate: 7.695E-05 | global batch size: 256 | lm loss: 2.252250E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.999 | TFLOPs: 30.43 | 7: iteration 71840/ 115203 | consumed samples: 18391040 | consumed tokens: 37664849920 | elapsed time per iteration (s): 0.43 | learning rate: 7.692E-05 | global batch size: 256 | lm loss: 2.273361E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.889 | TFLOPs: 31.32 | 7: iteration 71850/ 115203 | consumed samples: 18393600 | consumed tokens: 37670092800 | elapsed time per iteration (s): 0.43 | learning rate: 7.690E-05 | global batch size: 256 | lm loss: 2.273900E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.845 | TFLOPs: 30.95 | 7: iteration 71860/ 115203 | consumed samples: 18396160 | consumed tokens: 37675335680 | elapsed time per iteration (s): 0.44 | learning rate: 7.688E-05 | global batch size: 256 | lm loss: 2.254532E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.384 | TFLOPs: 30.82 | 7: iteration 71870/ 115203 | consumed samples: 18398720 | consumed tokens: 37680578560 | elapsed time per iteration (s): 0.44 | learning rate: 7.686E-05 | global batch size: 256 | lm loss: 2.282906E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.044 | TFLOPs: 30.85 | 7: iteration 71880/ 115203 | consumed samples: 18401280 | consumed tokens: 37685821440 | elapsed time per iteration (s): 0.43 | learning rate: 7.683E-05 | global batch size: 256 | lm loss: 2.237950E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.444 | TFLOPs: 31.08 | 7: iteration 71890/ 115203 | consumed samples: 18403840 | consumed tokens: 37691064320 | elapsed time per iteration (s): 0.43 | learning rate: 7.681E-05 | global batch size: 256 | lm loss: 2.266477E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.793 | TFLOPs: 31.47 | 7: iteration 71900/ 115203 | consumed samples: 18406400 | consumed tokens: 37696307200 | elapsed time per iteration (s): 0.44 | learning rate: 7.679E-05 | global batch size: 256 | lm loss: 2.251752E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.380 | TFLOPs: 30.71 | 7: iteration 71910/ 115203 | consumed samples: 18408960 | consumed tokens: 37701550080 | elapsed time per iteration (s): 0.44 | learning rate: 7.676E-05 | global batch size: 256 | lm loss: 2.291219E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.562 | TFLOPs: 30.83 | 7: iteration 71920/ 115203 | consumed samples: 18411520 | consumed tokens: 37706792960 | elapsed time per iteration (s): 0.45 | learning rate: 7.674E-05 | global batch size: 256 | lm loss: 2.272908E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.865 | TFLOPs: 30.06 | 7: iteration 71930/ 115203 | consumed samples: 18414080 | consumed tokens: 37712035840 | elapsed time per iteration (s): 0.43 | learning rate: 7.672E-05 | global batch size: 256 | lm loss: 2.240406E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.974 | TFLOPs: 31.06 | 7: iteration 71940/ 115203 | consumed samples: 18416640 | consumed tokens: 37717278720 | elapsed time per iteration (s): 0.44 | learning rate: 7.669E-05 | global batch size: 256 | lm loss: 2.271280E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.647 | TFLOPs: 30.36 | 7: iteration 71950/ 115203 | consumed samples: 18419200 | consumed tokens: 37722521600 | elapsed time per iteration (s): 0.49 | learning rate: 7.667E-05 | global batch size: 256 | lm loss: 2.260834E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 517.681 | TFLOPs: 27.16 | 7: iteration 71960/ 115203 | consumed samples: 18421760 | consumed tokens: 37727764480 | elapsed time per iteration (s): 0.42 | learning rate: 7.665E-05 | global batch size: 256 | lm loss: 2.268046E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.404 | TFLOPs: 31.66 | 7: iteration 71970/ 115203 | consumed samples: 18424320 | consumed tokens: 37733007360 | elapsed time per iteration (s): 0.43 | learning rate: 7.662E-05 | global batch size: 256 | lm loss: 2.303550E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.550 | TFLOPs: 31.09 | 7: iteration 71980/ 115203 | consumed samples: 18426880 | consumed tokens: 37738250240 | elapsed time per iteration (s): 0.44 | learning rate: 7.660E-05 | global batch size: 256 | lm loss: 2.269859E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.121 | TFLOPs: 30.44 | 7: iteration 71990/ 115203 | consumed samples: 18429440 | consumed tokens: 37743493120 | elapsed time per iteration (s): 0.43 | learning rate: 7.658E-05 | global batch size: 256 | lm loss: 2.251069E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.097 | TFLOPs: 31.01 | 0: [2022-11-28 21:37:20,842] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=0, lr=[7.655593093399763e-05, 7.655593093399763e-05, 7.655593093399763e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 72000/ 115203 | consumed samples: 18432000 | consumed tokens: 37748736000 | elapsed time per iteration (s): 0.45 | learning rate: 7.656E-05 | global batch size: 256 | lm loss: 2.249335E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.333 | TFLOPs: 29.82 | 0: steps: 72000 loss: 2.1930 iter time (s): 0.435 samples/sec: 588.375 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 72000 | lm loss value: 2.186875E+00 | lm loss PPL: 8.907335E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 72000 to checkpoints_221m 0: [2022-11-28 21:37:21,045] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step72000 is begin to save! 0: [2022-11-28 21:37:21,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:37:21,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:37:21,198] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:37:21,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:37:21,222] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:37:21,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:37:21,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:37:21,271] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:37:21,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:37:21,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:37:21,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:37:21,321] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:37:21,322] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:37:21,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:37:21,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:37:21,371] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:37:21,371] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:37:21,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:37:21,396] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:37:21,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:37:21,422] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:37:21,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:37:21,448] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:37:21,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:37:21,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:37:21,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:37:21,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:37:21,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:37:21,522] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:37:21,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:37:21,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:37:21,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:37:21,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:37:21,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:37:21,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:37:21,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:37:21,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:37:21,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:37:21,646] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:37:21,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:37:21,651] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step72000/mp_rank_00_model_states.pt 0: [2022-11-28 21:37:21,651] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:37:21,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,955] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:37:21,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:37:22,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:37:22,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 6: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,012] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,012] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,013] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,013] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 3: [2022-11-28 21:37:22,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:37:22,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:37:22,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:37:22,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 1: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 4: [2022-11-28 21:37:22,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:37:22,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:37:22,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:37:22,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 2: [2022-11-28 21:37:22,055] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 5: [2022-11-28 21:37:22,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:37:22,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:37:22,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:37:22,076] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 7: [2022-11-28 21:37:22,077] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: [2022-11-28 21:37:22,080] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step72000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:37:22,080] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step72000 is ready now! 0: successfully saved checkpoint at iteration 72000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1089.43 7: iteration 72010/ 115203 | consumed samples: 18434560 | consumed tokens: 37753978880 | elapsed time per iteration (s): 0.55 | learning rate: 7.653E-05 | global batch size: 256 | lm loss: 2.244870E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 466.280 | TFLOPs: 24.46 | 7: iteration 72020/ 115203 | consumed samples: 18437120 | consumed tokens: 37759221760 | elapsed time per iteration (s): 0.43 | learning rate: 7.651E-05 | global batch size: 256 | lm loss: 2.280758E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.899 | TFLOPs: 31.00 | 7: iteration 72030/ 115203 | consumed samples: 18439680 | consumed tokens: 37764464640 | elapsed time per iteration (s): 0.45 | learning rate: 7.649E-05 | global batch size: 256 | lm loss: 2.250215E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.965 | TFLOPs: 30.06 | 7: iteration 72040/ 115203 | consumed samples: 18442240 | consumed tokens: 37769707520 | elapsed time per iteration (s): 0.43 | learning rate: 7.646E-05 | global batch size: 256 | lm loss: 2.281224E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.156 | TFLOPs: 30.91 | 7: iteration 72050/ 115203 | consumed samples: 18444800 | consumed tokens: 37774950400 | elapsed time per iteration (s): 0.43 | learning rate: 7.644E-05 | global batch size: 256 | lm loss: 2.232433E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.321 | TFLOPs: 30.92 | 7: iteration 72060/ 115203 | consumed samples: 18447360 | consumed tokens: 37780193280 | elapsed time per iteration (s): 0.63 | learning rate: 7.642E-05 | global batch size: 256 | lm loss: 2.270767E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 409.113 | TFLOPs: 21.47 | 7: iteration 72070/ 115203 | consumed samples: 18449920 | consumed tokens: 37785436160 | elapsed time per iteration (s): 0.44 | learning rate: 7.639E-05 | global batch size: 256 | lm loss: 2.251420E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.186 | TFLOPs: 30.34 | 7: iteration 72080/ 115203 | consumed samples: 18452480 | consumed tokens: 37790679040 | elapsed time per iteration (s): 0.43 | learning rate: 7.637E-05 | global batch size: 256 | lm loss: 2.270511E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.304 | TFLOPs: 31.39 | 7: iteration 72090/ 115203 | consumed samples: 18455040 | consumed tokens: 37795921920 | elapsed time per iteration (s): 0.43 | learning rate: 7.635E-05 | global batch size: 256 | lm loss: 2.272776E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.178 | TFLOPs: 31.02 | 7: iteration 72100/ 115203 | consumed samples: 18457600 | consumed tokens: 37801164800 | elapsed time per iteration (s): 0.44 | learning rate: 7.633E-05 | global batch size: 256 | lm loss: 2.274252E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.398 | TFLOPs: 30.87 | 7: iteration 72110/ 115203 | consumed samples: 18460160 | consumed tokens: 37806407680 | elapsed time per iteration (s): 0.43 | learning rate: 7.630E-05 | global batch size: 256 | lm loss: 2.242520E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.466 | TFLOPs: 31.35 | 7: iteration 72120/ 115203 | consumed samples: 18462720 | consumed tokens: 37811650560 | elapsed time per iteration (s): 0.44 | learning rate: 7.628E-05 | global batch size: 256 | lm loss: 2.236268E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.165 | TFLOPs: 30.81 | 7: iteration 72130/ 115203 | consumed samples: 18465280 | consumed tokens: 37816893440 | elapsed time per iteration (s): 0.43 | learning rate: 7.626E-05 | global batch size: 256 | lm loss: 2.248688E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.673 | TFLOPs: 31.15 | 7: iteration 72140/ 115203 | consumed samples: 18467840 | consumed tokens: 37822136320 | elapsed time per iteration (s): 0.43 | learning rate: 7.623E-05 | global batch size: 256 | lm loss: 2.256784E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.468 | TFLOPs: 31.09 | 7: iteration 72150/ 115203 | consumed samples: 18470400 | consumed tokens: 37827379200 | elapsed time per iteration (s): 0.43 | learning rate: 7.621E-05 | global batch size: 256 | lm loss: 2.224733E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.562 | TFLOPs: 31.41 | 7: iteration 72160/ 115203 | consumed samples: 18472960 | consumed tokens: 37832622080 | elapsed time per iteration (s): 0.43 | learning rate: 7.619E-05 | global batch size: 256 | lm loss: 2.272751E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.546 | TFLOPs: 31.30 | 7: iteration 72170/ 115203 | consumed samples: 18475520 | consumed tokens: 37837864960 | elapsed time per iteration (s): 0.43 | learning rate: 7.617E-05 | global batch size: 256 | lm loss: 2.262752E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.879 | TFLOPs: 30.90 | 7: iteration 72180/ 115203 | consumed samples: 18478080 | consumed tokens: 37843107840 | elapsed time per iteration (s): 0.43 | learning rate: 7.614E-05 | global batch size: 256 | lm loss: 2.262365E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.383 | TFLOPs: 31.29 | 7: iteration 72190/ 115203 | consumed samples: 18480640 | consumed tokens: 37848350720 | elapsed time per iteration (s): 0.45 | learning rate: 7.612E-05 | global batch size: 256 | lm loss: 2.253437E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.980 | TFLOPs: 29.85 | 7: iteration 72200/ 115203 | consumed samples: 18483200 | consumed tokens: 37853593600 | elapsed time per iteration (s): 0.44 | learning rate: 7.610E-05 | global batch size: 256 | lm loss: 2.240495E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.982 | TFLOPs: 30.59 | 7: iteration 72210/ 115203 | consumed samples: 18485760 | consumed tokens: 37858836480 | elapsed time per iteration (s): 0.44 | learning rate: 7.607E-05 | global batch size: 256 | lm loss: 2.223443E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.649 | TFLOPs: 30.68 | 7: iteration 72220/ 115203 | consumed samples: 18488320 | consumed tokens: 37864079360 | elapsed time per iteration (s): 0.43 | learning rate: 7.605E-05 | global batch size: 256 | lm loss: 2.254319E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.468 | TFLOPs: 30.98 | 7: iteration 72230/ 115203 | consumed samples: 18490880 | consumed tokens: 37869322240 | elapsed time per iteration (s): 0.42 | learning rate: 7.603E-05 | global batch size: 256 | lm loss: 2.261709E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.644 | TFLOPs: 31.62 | 7: iteration 72240/ 115203 | consumed samples: 18493440 | consumed tokens: 37874565120 | elapsed time per iteration (s): 0.44 | learning rate: 7.600E-05 | global batch size: 256 | lm loss: 2.297286E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.911 | TFLOPs: 30.79 | 7: iteration 72250/ 115203 | consumed samples: 18496000 | consumed tokens: 37879808000 | elapsed time per iteration (s): 0.43 | learning rate: 7.598E-05 | global batch size: 256 | lm loss: 2.239969E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.759 | TFLOPs: 31.05 | 7: iteration 72260/ 115203 | consumed samples: 18498560 | consumed tokens: 37885050880 | elapsed time per iteration (s): 0.43 | learning rate: 7.596E-05 | global batch size: 256 | lm loss: 2.240720E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.595 | TFLOPs: 30.94 | 7: iteration 72270/ 115203 | consumed samples: 18501120 | consumed tokens: 37890293760 | elapsed time per iteration (s): 0.44 | learning rate: 7.594E-05 | global batch size: 256 | lm loss: 2.272863E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.728 | TFLOPs: 30.36 | 7: iteration 72280/ 115203 | consumed samples: 18503680 | consumed tokens: 37895536640 | elapsed time per iteration (s): 0.44 | learning rate: 7.591E-05 | global batch size: 256 | lm loss: 2.266425E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.726 | TFLOPs: 30.26 | 7: iteration 72290/ 115203 | consumed samples: 18506240 | consumed tokens: 37900779520 | elapsed time per iteration (s): 0.43 | learning rate: 7.589E-05 | global batch size: 256 | lm loss: 2.222531E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.016 | TFLOPs: 31.43 | 7: iteration 72300/ 115203 | consumed samples: 18508800 | consumed tokens: 37906022400 | elapsed time per iteration (s): 0.43 | learning rate: 7.587E-05 | global batch size: 256 | lm loss: 2.283411E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.349 | TFLOPs: 31.08 | 7: iteration 72310/ 115203 | consumed samples: 18511360 | consumed tokens: 37911265280 | elapsed time per iteration (s): 0.44 | learning rate: 7.584E-05 | global batch size: 256 | lm loss: 2.260814E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.548 | TFLOPs: 30.72 | 7: iteration 72320/ 115203 | consumed samples: 18513920 | consumed tokens: 37916508160 | elapsed time per iteration (s): 0.43 | learning rate: 7.582E-05 | global batch size: 256 | lm loss: 2.231971E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.236 | TFLOPs: 30.92 | 7: iteration 72330/ 115203 | consumed samples: 18516480 | consumed tokens: 37921751040 | elapsed time per iteration (s): 0.43 | learning rate: 7.580E-05 | global batch size: 256 | lm loss: 2.232974E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.293 | TFLOPs: 31.02 | 7: iteration 72340/ 115203 | consumed samples: 18519040 | consumed tokens: 37926993920 | elapsed time per iteration (s): 0.43 | learning rate: 7.577E-05 | global batch size: 256 | lm loss: 2.251063E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.914 | TFLOPs: 31.16 | 7: iteration 72350/ 115203 | consumed samples: 18521600 | consumed tokens: 37932236800 | elapsed time per iteration (s): 0.43 | learning rate: 7.575E-05 | global batch size: 256 | lm loss: 2.270950E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.870 | TFLOPs: 30.90 | 7: iteration 72360/ 115203 | consumed samples: 18524160 | consumed tokens: 37937479680 | elapsed time per iteration (s): 0.44 | learning rate: 7.573E-05 | global batch size: 256 | lm loss: 2.279781E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.070 | TFLOPs: 30.65 | 7: iteration 72370/ 115203 | consumed samples: 18526720 | consumed tokens: 37942722560 | elapsed time per iteration (s): 0.43 | learning rate: 7.571E-05 | global batch size: 256 | lm loss: 2.271211E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.504 | TFLOPs: 31.51 | 7: iteration 72380/ 115203 | consumed samples: 18529280 | consumed tokens: 37947965440 | elapsed time per iteration (s): 0.43 | learning rate: 7.568E-05 | global batch size: 256 | lm loss: 2.245964E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.375 | TFLOPs: 31.13 | 7: iteration 72390/ 115203 | consumed samples: 18531840 | consumed tokens: 37953208320 | elapsed time per iteration (s): 0.43 | learning rate: 7.566E-05 | global batch size: 256 | lm loss: 2.259974E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.983 | TFLOPs: 31.01 | 7: iteration 72400/ 115203 | consumed samples: 18534400 | consumed tokens: 37958451200 | elapsed time per iteration (s): 0.43 | learning rate: 7.564E-05 | global batch size: 256 | lm loss: 2.280057E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.400 | TFLOPs: 30.98 | 7: iteration 72410/ 115203 | consumed samples: 18536960 | consumed tokens: 37963694080 | elapsed time per iteration (s): 0.43 | learning rate: 7.561E-05 | global batch size: 256 | lm loss: 2.271787E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.542 | TFLOPs: 31.56 | 7: iteration 72420/ 115203 | consumed samples: 18539520 | consumed tokens: 37968936960 | elapsed time per iteration (s): 0.44 | learning rate: 7.559E-05 | global batch size: 256 | lm loss: 2.254364E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.594 | TFLOPs: 30.83 | 7: iteration 72430/ 115203 | consumed samples: 18542080 | consumed tokens: 37974179840 | elapsed time per iteration (s): 0.44 | learning rate: 7.557E-05 | global batch size: 256 | lm loss: 2.262326E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.110 | TFLOPs: 30.75 | 7: iteration 72440/ 115203 | consumed samples: 18544640 | consumed tokens: 37979422720 | elapsed time per iteration (s): 0.43 | learning rate: 7.555E-05 | global batch size: 256 | lm loss: 2.248963E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.157 | TFLOPs: 31.02 | 7: iteration 72450/ 115203 | consumed samples: 18547200 | consumed tokens: 37984665600 | elapsed time per iteration (s): 0.43 | learning rate: 7.552E-05 | global batch size: 256 | lm loss: 2.233365E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.907 | TFLOPs: 31.16 | 7: iteration 72460/ 115203 | consumed samples: 18549760 | consumed tokens: 37989908480 | elapsed time per iteration (s): 0.44 | learning rate: 7.550E-05 | global batch size: 256 | lm loss: 2.286503E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.373 | TFLOPs: 30.71 | 7: iteration 72470/ 115203 | consumed samples: 18552320 | consumed tokens: 37995151360 | elapsed time per iteration (s): 0.43 | learning rate: 7.548E-05 | global batch size: 256 | lm loss: 2.258867E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.173 | TFLOPs: 31.33 | 7: iteration 72480/ 115203 | consumed samples: 18554880 | consumed tokens: 38000394240 | elapsed time per iteration (s): 0.43 | learning rate: 7.545E-05 | global batch size: 256 | lm loss: 2.276221E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.509 | TFLOPs: 30.98 | 7: iteration 72490/ 115203 | consumed samples: 18557440 | consumed tokens: 38005637120 | elapsed time per iteration (s): 0.43 | learning rate: 7.543E-05 | global batch size: 256 | lm loss: 2.235498E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.952 | TFLOPs: 31.06 | 7: iteration 72500/ 115203 | consumed samples: 18560000 | consumed tokens: 38010880000 | elapsed time per iteration (s): 0.43 | learning rate: 7.541E-05 | global batch size: 256 | lm loss: 2.229151E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.326 | TFLOPs: 31.18 | 7: iteration 72510/ 115203 | consumed samples: 18562560 | consumed tokens: 38016122880 | elapsed time per iteration (s): 0.44 | learning rate: 7.539E-05 | global batch size: 256 | lm loss: 2.251814E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.309 | TFLOPs: 30.76 | 7: iteration 72520/ 115203 | consumed samples: 18565120 | consumed tokens: 38021365760 | elapsed time per iteration (s): 0.43 | learning rate: 7.536E-05 | global batch size: 256 | lm loss: 2.239125E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.653 | TFLOPs: 31.10 | 7: iteration 72530/ 115203 | consumed samples: 18567680 | consumed tokens: 38026608640 | elapsed time per iteration (s): 0.43 | learning rate: 7.534E-05 | global batch size: 256 | lm loss: 2.269877E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.600 | TFLOPs: 31.56 | 7: iteration 72540/ 115203 | consumed samples: 18570240 | consumed tokens: 38031851520 | elapsed time per iteration (s): 0.45 | learning rate: 7.532E-05 | global batch size: 256 | lm loss: 2.252668E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.418 | TFLOPs: 29.77 | 7: iteration 72550/ 115203 | consumed samples: 18572800 | consumed tokens: 38037094400 | elapsed time per iteration (s): 0.42 | learning rate: 7.529E-05 | global batch size: 256 | lm loss: 2.233505E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.990 | TFLOPs: 31.64 | 7: iteration 72560/ 115203 | consumed samples: 18575360 | consumed tokens: 38042337280 | elapsed time per iteration (s): 0.43 | learning rate: 7.527E-05 | global batch size: 256 | lm loss: 2.270177E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.381 | TFLOPs: 31.19 | 7: iteration 72570/ 115203 | consumed samples: 18577920 | consumed tokens: 38047580160 | elapsed time per iteration (s): 0.42 | learning rate: 7.525E-05 | global batch size: 256 | lm loss: 2.273404E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.636 | TFLOPs: 31.62 | 7: iteration 72580/ 115203 | consumed samples: 18580480 | consumed tokens: 38052823040 | elapsed time per iteration (s): 0.43 | learning rate: 7.523E-05 | global batch size: 256 | lm loss: 2.261978E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.348 | TFLOPs: 31.24 | 7: iteration 72590/ 115203 | consumed samples: 18583040 | consumed tokens: 38058065920 | elapsed time per iteration (s): 0.43 | learning rate: 7.520E-05 | global batch size: 256 | lm loss: 2.253871E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.518 | TFLOPs: 31.04 | 7: iteration 72600/ 115203 | consumed samples: 18585600 | consumed tokens: 38063308800 | elapsed time per iteration (s): 0.44 | learning rate: 7.518E-05 | global batch size: 256 | lm loss: 2.254274E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.259 | TFLOPs: 30.76 | 7: iteration 72610/ 115203 | consumed samples: 18588160 | consumed tokens: 38068551680 | elapsed time per iteration (s): 0.42 | learning rate: 7.516E-05 | global batch size: 256 | lm loss: 2.230628E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.543 | TFLOPs: 31.67 | 7: iteration 72620/ 115203 | consumed samples: 18590720 | consumed tokens: 38073794560 | elapsed time per iteration (s): 0.43 | learning rate: 7.513E-05 | global batch size: 256 | lm loss: 2.264406E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.068 | TFLOPs: 31.27 | 7: iteration 72630/ 115203 | consumed samples: 18593280 | consumed tokens: 38079037440 | elapsed time per iteration (s): 0.43 | learning rate: 7.511E-05 | global batch size: 256 | lm loss: 2.242925E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.930 | TFLOPs: 31.01 | 7: iteration 72640/ 115203 | consumed samples: 18595840 | consumed tokens: 38084280320 | elapsed time per iteration (s): 0.43 | learning rate: 7.509E-05 | global batch size: 256 | lm loss: 2.262600E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.877 | TFLOPs: 31.00 | 7: iteration 72650/ 115203 | consumed samples: 18598400 | consumed tokens: 38089523200 | elapsed time per iteration (s): 0.43 | learning rate: 7.507E-05 | global batch size: 256 | lm loss: 2.263487E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.727 | TFLOPs: 30.89 | 7: iteration 72660/ 115203 | consumed samples: 18600960 | consumed tokens: 38094766080 | elapsed time per iteration (s): 0.44 | learning rate: 7.504E-05 | global batch size: 256 | lm loss: 2.245894E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.909 | TFLOPs: 30.32 | 7: iteration 72670/ 115203 | consumed samples: 18603520 | consumed tokens: 38100008960 | elapsed time per iteration (s): 0.43 | learning rate: 7.502E-05 | global batch size: 256 | lm loss: 2.244042E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.693 | TFLOPs: 31.26 | 7: iteration 72680/ 115203 | consumed samples: 18606080 | consumed tokens: 38105251840 | elapsed time per iteration (s): 0.43 | learning rate: 7.500E-05 | global batch size: 256 | lm loss: 2.269920E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.278 | TFLOPs: 31.23 | 7: iteration 72690/ 115203 | consumed samples: 18608640 | consumed tokens: 38110494720 | elapsed time per iteration (s): 0.43 | learning rate: 7.497E-05 | global batch size: 256 | lm loss: 2.264046E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.296 | TFLOPs: 31.08 | 7: iteration 72700/ 115203 | consumed samples: 18611200 | consumed tokens: 38115737600 | elapsed time per iteration (s): 0.43 | learning rate: 7.495E-05 | global batch size: 256 | lm loss: 2.242230E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.413 | TFLOPs: 31.24 | 7: iteration 72710/ 115203 | consumed samples: 18613760 | consumed tokens: 38120980480 | elapsed time per iteration (s): 0.42 | learning rate: 7.493E-05 | global batch size: 256 | lm loss: 2.221827E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.926 | TFLOPs: 31.69 | 7: iteration 72720/ 115203 | consumed samples: 18616320 | consumed tokens: 38126223360 | elapsed time per iteration (s): 0.43 | learning rate: 7.491E-05 | global batch size: 256 | lm loss: 2.266062E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.500 | TFLOPs: 31.30 | 7: iteration 72730/ 115203 | consumed samples: 18618880 | consumed tokens: 38131466240 | elapsed time per iteration (s): 0.43 | learning rate: 7.488E-05 | global batch size: 256 | lm loss: 2.294685E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.341 | TFLOPs: 31.39 | 7: iteration 72740/ 115203 | consumed samples: 18621440 | consumed tokens: 38136709120 | elapsed time per iteration (s): 0.45 | learning rate: 7.486E-05 | global batch size: 256 | lm loss: 2.288021E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.093 | TFLOPs: 29.91 | 7: iteration 72750/ 115203 | consumed samples: 18624000 | consumed tokens: 38141952000 | elapsed time per iteration (s): 0.43 | learning rate: 7.484E-05 | global batch size: 256 | lm loss: 2.240263E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.928 | TFLOPs: 31.16 | 7: iteration 72760/ 115203 | consumed samples: 18626560 | consumed tokens: 38147194880 | elapsed time per iteration (s): 0.43 | learning rate: 7.481E-05 | global batch size: 256 | lm loss: 2.277531E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.524 | TFLOPs: 30.98 | 7: iteration 72770/ 115203 | consumed samples: 18629120 | consumed tokens: 38152437760 | elapsed time per iteration (s): 0.43 | learning rate: 7.479E-05 | global batch size: 256 | lm loss: 2.243953E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.738 | TFLOPs: 31.15 | 7: iteration 72780/ 115203 | consumed samples: 18631680 | consumed tokens: 38157680640 | elapsed time per iteration (s): 0.43 | learning rate: 7.477E-05 | global batch size: 256 | lm loss: 2.289880E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.068 | TFLOPs: 31.38 | 7: iteration 72790/ 115203 | consumed samples: 18634240 | consumed tokens: 38162923520 | elapsed time per iteration (s): 0.43 | learning rate: 7.475E-05 | global batch size: 256 | lm loss: 2.270348E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.206 | TFLOPs: 31.33 | 7: iteration 72800/ 115203 | consumed samples: 18636800 | consumed tokens: 38168166400 | elapsed time per iteration (s): 0.43 | learning rate: 7.472E-05 | global batch size: 256 | lm loss: 2.247676E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.754 | TFLOPs: 31.52 | 7: iteration 72810/ 115203 | consumed samples: 18639360 | consumed tokens: 38173409280 | elapsed time per iteration (s): 0.43 | learning rate: 7.470E-05 | global batch size: 256 | lm loss: 2.287165E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.217 | TFLOPs: 31.13 | 7: iteration 72820/ 115203 | consumed samples: 18641920 | consumed tokens: 38178652160 | elapsed time per iteration (s): 0.43 | learning rate: 7.468E-05 | global batch size: 256 | lm loss: 2.253701E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.120 | TFLOPs: 31.38 | 7: iteration 72830/ 115203 | consumed samples: 18644480 | consumed tokens: 38183895040 | elapsed time per iteration (s): 0.43 | learning rate: 7.465E-05 | global batch size: 256 | lm loss: 2.246525E+00 | grad norm: 0.208 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.539 | TFLOPs: 31.09 | 7: iteration 72840/ 115203 | consumed samples: 18647040 | consumed tokens: 38189137920 | elapsed time per iteration (s): 0.45 | learning rate: 7.463E-05 | global batch size: 256 | lm loss: 2.265918E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.409 | TFLOPs: 30.03 | 7: iteration 72850/ 115203 | consumed samples: 18649600 | consumed tokens: 38194380800 | elapsed time per iteration (s): 0.42 | learning rate: 7.461E-05 | global batch size: 256 | lm loss: 2.274568E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.204 | TFLOPs: 31.65 | 7: iteration 72860/ 115203 | consumed samples: 18652160 | consumed tokens: 38199623680 | elapsed time per iteration (s): 0.46 | learning rate: 7.459E-05 | global batch size: 256 | lm loss: 2.258441E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.258 | TFLOPs: 29.40 | 7: iteration 72870/ 115203 | consumed samples: 18654720 | consumed tokens: 38204866560 | elapsed time per iteration (s): 0.43 | learning rate: 7.456E-05 | global batch size: 256 | lm loss: 2.255869E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.000 | TFLOPs: 30.96 | 7: iteration 72880/ 115203 | consumed samples: 18657280 | consumed tokens: 38210109440 | elapsed time per iteration (s): 0.45 | learning rate: 7.454E-05 | global batch size: 256 | lm loss: 2.266751E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.586 | TFLOPs: 30.15 | 7: iteration 72890/ 115203 | consumed samples: 18659840 | consumed tokens: 38215352320 | elapsed time per iteration (s): 0.43 | learning rate: 7.452E-05 | global batch size: 256 | lm loss: 2.264530E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.167 | TFLOPs: 31.07 | 7: iteration 72900/ 115203 | consumed samples: 18662400 | consumed tokens: 38220595200 | elapsed time per iteration (s): 0.43 | learning rate: 7.450E-05 | global batch size: 256 | lm loss: 2.225446E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.275 | TFLOPs: 30.92 | 7: iteration 72910/ 115203 | consumed samples: 18664960 | consumed tokens: 38225838080 | elapsed time per iteration (s): 0.44 | learning rate: 7.447E-05 | global batch size: 256 | lm loss: 2.238908E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.026 | TFLOPs: 30.54 | 7: iteration 72920/ 115203 | consumed samples: 18667520 | consumed tokens: 38231080960 | elapsed time per iteration (s): 0.43 | learning rate: 7.445E-05 | global batch size: 256 | lm loss: 2.248878E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.372 | TFLOPs: 31.50 | 7: iteration 72930/ 115203 | consumed samples: 18670080 | consumed tokens: 38236323840 | elapsed time per iteration (s): 0.43 | learning rate: 7.443E-05 | global batch size: 256 | lm loss: 2.248182E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.131 | TFLOPs: 31.17 | 7: iteration 72940/ 115203 | consumed samples: 18672640 | consumed tokens: 38241566720 | elapsed time per iteration (s): 0.42 | learning rate: 7.440E-05 | global batch size: 256 | lm loss: 2.221302E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.605 | TFLOPs: 31.88 | 7: iteration 72950/ 115203 | consumed samples: 18675200 | consumed tokens: 38246809600 | elapsed time per iteration (s): 0.44 | learning rate: 7.438E-05 | global batch size: 256 | lm loss: 2.289935E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.397 | TFLOPs: 30.71 | 7: iteration 72960/ 115203 | consumed samples: 18677760 | consumed tokens: 38252052480 | elapsed time per iteration (s): 0.43 | learning rate: 7.436E-05 | global batch size: 256 | lm loss: 2.265464E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.190 | TFLOPs: 31.33 | 7: iteration 72970/ 115203 | consumed samples: 18680320 | consumed tokens: 38257295360 | elapsed time per iteration (s): 0.43 | learning rate: 7.434E-05 | global batch size: 256 | lm loss: 2.243654E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.429 | TFLOPs: 31.45 | 7: iteration 72980/ 115203 | consumed samples: 18682880 | consumed tokens: 38262538240 | elapsed time per iteration (s): 0.43 | learning rate: 7.431E-05 | global batch size: 256 | lm loss: 2.271692E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.966 | TFLOPs: 31.37 | 7: iteration 72990/ 115203 | consumed samples: 18685440 | consumed tokens: 38267781120 | elapsed time per iteration (s): 0.43 | learning rate: 7.429E-05 | global batch size: 256 | lm loss: 2.288695E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.903 | TFLOPs: 31.27 | 7: iteration 73000/ 115203 | consumed samples: 18688000 | consumed tokens: 38273024000 | elapsed time per iteration (s): 0.43 | learning rate: 7.427E-05 | global batch size: 256 | lm loss: 2.256467E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.535 | TFLOPs: 30.98 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 73000 | lm loss value: 2.132576E+00 | lm loss PPL: 8.436571E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 73000 to checkpoints_221m 0: [2022-11-28 21:44:37,310] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step73000 is begin to save! 0: [2022-11-28 21:44:37,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:44:37,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:44:37,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:44:37,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:44:37,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:44:37,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:44:37,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:44:37,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:44:37,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:44:37,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:44:37,569] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:44:37,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:44:37,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:44:37,620] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:44:37,620] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:44:37,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:44:37,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:44:37,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:44:37,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:44:37,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:44:37,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:44:37,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:44:37,721] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:44:37,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:44:37,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:44:37,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:44:37,771] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:44:37,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:44:37,796] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:44:37,821] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:44:37,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:44:37,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:44:37,846] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:44:37,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:44:37,871] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:44:37,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:44:37,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:44:37,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:44:37,921] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:44:37,925] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:44:37,926] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step73000/mp_rank_00_model_states.pt 0: [2022-11-28 21:44:37,926] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:44:37,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:44:37,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:44:37,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:37,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:37,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:37,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:37,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:37,997] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:37,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:37,997] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:37,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:37,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:37,998] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:37,998] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:37,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:37,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:37,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,000] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,000] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:38,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:38,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,001] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,002] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,002] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,004] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,004] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,004] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:44:38,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,006] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,006] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,007] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,007] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:38,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,008] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,008] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,009] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,009] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:38,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,010] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:38,010] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,013] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,014] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:44:38,014] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 2: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2022-11-28 21:44:38,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 7: [2022-11-28 21:44:38,015] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 1: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:44:38,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:38,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 4: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:44:38,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 4: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:44:38,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:44:38,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 6: [2022-11-28 21:44:38,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:44:38,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 21:44:38,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: [2022-11-28 21:44:38,070] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:44:38,070] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,202] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,202] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:44:38,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,204] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step73000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 3: [2022-11-28 21:44:38,204] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step73000 is ready now! 0: successfully saved checkpoint at iteration 73000 to checkpoints_221m 7: time (ms) | save-checkpoint: 921.28 7: iteration 73010/ 115203 | consumed samples: 18690560 | consumed tokens: 38278266880 | elapsed time per iteration (s): 0.53 | learning rate: 7.424E-05 | global batch size: 256 | lm loss: 2.266063E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.037 | TFLOPs: 25.34 | 7: iteration 73020/ 115203 | consumed samples: 18693120 | consumed tokens: 38283509760 | elapsed time per iteration (s): 0.56 | learning rate: 7.422E-05 | global batch size: 256 | lm loss: 2.271627E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 453.658 | TFLOPs: 23.80 | 7: iteration 73030/ 115203 | consumed samples: 18695680 | consumed tokens: 38288752640 | elapsed time per iteration (s): 0.43 | learning rate: 7.420E-05 | global batch size: 256 | lm loss: 2.245538E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.052 | TFLOPs: 31.38 | 7: iteration 73040/ 115203 | consumed samples: 18698240 | consumed tokens: 38293995520 | elapsed time per iteration (s): 0.43 | learning rate: 7.418E-05 | global batch size: 256 | lm loss: 2.267308E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.567 | TFLOPs: 31.56 | 7: iteration 73050/ 115203 | consumed samples: 18700800 | consumed tokens: 38299238400 | elapsed time per iteration (s): 0.44 | learning rate: 7.415E-05 | global batch size: 256 | lm loss: 2.266054E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.470 | TFLOPs: 30.77 | 7: iteration 73060/ 115203 | consumed samples: 18703360 | consumed tokens: 38304481280 | elapsed time per iteration (s): 0.42 | learning rate: 7.413E-05 | global batch size: 256 | lm loss: 2.260911E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.777 | TFLOPs: 32.05 | 7: iteration 73070/ 115203 | consumed samples: 18705920 | consumed tokens: 38309724160 | elapsed time per iteration (s): 0.44 | learning rate: 7.411E-05 | global batch size: 256 | lm loss: 2.250863E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.738 | TFLOPs: 30.79 | 7: iteration 73080/ 115203 | consumed samples: 18708480 | consumed tokens: 38314967040 | elapsed time per iteration (s): 0.42 | learning rate: 7.409E-05 | global batch size: 256 | lm loss: 2.244246E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.190 | TFLOPs: 31.81 | 7: iteration 73090/ 115203 | consumed samples: 18711040 | consumed tokens: 38320209920 | elapsed time per iteration (s): 0.43 | learning rate: 7.406E-05 | global batch size: 256 | lm loss: 2.259782E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.174 | TFLOPs: 30.91 | 7: iteration 73100/ 115203 | consumed samples: 18713600 | consumed tokens: 38325452800 | elapsed time per iteration (s): 0.43 | learning rate: 7.404E-05 | global batch size: 256 | lm loss: 2.281993E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.131 | TFLOPs: 31.17 | 7: iteration 73110/ 115203 | consumed samples: 18716160 | consumed tokens: 38330695680 | elapsed time per iteration (s): 0.44 | learning rate: 7.402E-05 | global batch size: 256 | lm loss: 2.241940E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.975 | TFLOPs: 30.80 | 7: iteration 73120/ 115203 | consumed samples: 18718720 | consumed tokens: 38335938560 | elapsed time per iteration (s): 0.43 | learning rate: 7.399E-05 | global batch size: 256 | lm loss: 2.281596E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.386 | TFLOPs: 31.29 | 7: iteration 73130/ 115203 | consumed samples: 18721280 | consumed tokens: 38341181440 | elapsed time per iteration (s): 0.45 | learning rate: 7.397E-05 | global batch size: 256 | lm loss: 2.270656E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.480 | TFLOPs: 30.14 | 7: iteration 73140/ 115203 | consumed samples: 18723840 | consumed tokens: 38346424320 | elapsed time per iteration (s): 0.44 | learning rate: 7.395E-05 | global batch size: 256 | lm loss: 2.248273E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.975 | TFLOPs: 30.48 | 7: iteration 73150/ 115203 | consumed samples: 18726400 | consumed tokens: 38351667200 | elapsed time per iteration (s): 0.43 | learning rate: 7.393E-05 | global batch size: 256 | lm loss: 2.249425E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.646 | TFLOPs: 31.20 | 7: iteration 73160/ 115203 | consumed samples: 18728960 | consumed tokens: 38356910080 | elapsed time per iteration (s): 0.43 | learning rate: 7.390E-05 | global batch size: 256 | lm loss: 2.234062E+00 | grad norm: 0.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.656 | TFLOPs: 31.15 | 7: iteration 73170/ 115203 | consumed samples: 18731520 | consumed tokens: 38362152960 | elapsed time per iteration (s): 0.43 | learning rate: 7.388E-05 | global batch size: 256 | lm loss: 2.242236E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.835 | TFLOPs: 31.52 | 7: iteration 73180/ 115203 | consumed samples: 18734080 | consumed tokens: 38367395840 | elapsed time per iteration (s): 0.44 | learning rate: 7.386E-05 | global batch size: 256 | lm loss: 2.263356E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.754 | TFLOPs: 30.73 | 7: iteration 73190/ 115203 | consumed samples: 18736640 | consumed tokens: 38372638720 | elapsed time per iteration (s): 0.43 | learning rate: 7.384E-05 | global batch size: 256 | lm loss: 2.236103E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.348 | TFLOPs: 31.08 | 7: iteration 73200/ 115203 | consumed samples: 18739200 | consumed tokens: 38377881600 | elapsed time per iteration (s): 0.43 | learning rate: 7.381E-05 | global batch size: 256 | lm loss: 2.275507E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.768 | TFLOPs: 31.15 | 7: iteration 73210/ 115203 | consumed samples: 18741760 | consumed tokens: 38383124480 | elapsed time per iteration (s): 0.43 | learning rate: 7.379E-05 | global batch size: 256 | lm loss: 2.283606E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.769 | TFLOPs: 31.00 | 7: iteration 73220/ 115203 | consumed samples: 18744320 | consumed tokens: 38388367360 | elapsed time per iteration (s): 0.43 | learning rate: 7.377E-05 | global batch size: 256 | lm loss: 2.225156E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.968 | TFLOPs: 31.22 | 7: iteration 73230/ 115203 | consumed samples: 18746880 | consumed tokens: 38393610240 | elapsed time per iteration (s): 0.43 | learning rate: 7.374E-05 | global batch size: 256 | lm loss: 2.254094E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.412 | TFLOPs: 31.24 | 7: iteration 73240/ 115203 | consumed samples: 18749440 | consumed tokens: 38398853120 | elapsed time per iteration (s): 0.44 | learning rate: 7.372E-05 | global batch size: 256 | lm loss: 2.255991E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.741 | TFLOPs: 30.63 | 7: iteration 73250/ 115203 | consumed samples: 18752000 | consumed tokens: 38404096000 | elapsed time per iteration (s): 0.43 | learning rate: 7.370E-05 | global batch size: 256 | lm loss: 2.290449E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.117 | TFLOPs: 31.38 | 7: iteration 73260/ 115203 | consumed samples: 18754560 | consumed tokens: 38409338880 | elapsed time per iteration (s): 0.43 | learning rate: 7.368E-05 | global batch size: 256 | lm loss: 2.268176E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.103 | TFLOPs: 31.22 | 7: iteration 73270/ 115203 | consumed samples: 18757120 | consumed tokens: 38414581760 | elapsed time per iteration (s): 0.42 | learning rate: 7.365E-05 | global batch size: 256 | lm loss: 2.245038E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.409 | TFLOPs: 31.76 | 7: iteration 73280/ 115203 | consumed samples: 18759680 | consumed tokens: 38419824640 | elapsed time per iteration (s): 0.43 | learning rate: 7.363E-05 | global batch size: 256 | lm loss: 2.256587E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.326 | TFLOPs: 31.39 | 7: iteration 73290/ 115203 | consumed samples: 18762240 | consumed tokens: 38425067520 | elapsed time per iteration (s): 0.42 | learning rate: 7.361E-05 | global batch size: 256 | lm loss: 2.266941E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.274 | TFLOPs: 31.97 | 7: iteration 73300/ 115203 | consumed samples: 18764800 | consumed tokens: 38430310400 | elapsed time per iteration (s): 0.43 | learning rate: 7.359E-05 | global batch size: 256 | lm loss: 2.253777E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.623 | TFLOPs: 30.88 | 7: iteration 73310/ 115203 | consumed samples: 18767360 | consumed tokens: 38435553280 | elapsed time per iteration (s): 0.42 | learning rate: 7.356E-05 | global batch size: 256 | lm loss: 2.226981E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.302 | TFLOPs: 32.02 | 7: iteration 73320/ 115203 | consumed samples: 18769920 | consumed tokens: 38440796160 | elapsed time per iteration (s): 0.43 | learning rate: 7.354E-05 | global batch size: 256 | lm loss: 2.258087E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.591 | TFLOPs: 31.14 | 7: iteration 73330/ 115203 | consumed samples: 18772480 | consumed tokens: 38446039040 | elapsed time per iteration (s): 0.43 | learning rate: 7.352E-05 | global batch size: 256 | lm loss: 2.257913E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.651 | TFLOPs: 31.41 | 7: iteration 73340/ 115203 | consumed samples: 18775040 | consumed tokens: 38451281920 | elapsed time per iteration (s): 0.43 | learning rate: 7.350E-05 | global batch size: 256 | lm loss: 2.262865E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.310 | TFLOPs: 30.92 | 7: iteration 73350/ 115203 | consumed samples: 18777600 | consumed tokens: 38456524800 | elapsed time per iteration (s): 0.43 | learning rate: 7.347E-05 | global batch size: 256 | lm loss: 2.266505E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.759 | TFLOPs: 31.00 | 7: iteration 73360/ 115203 | consumed samples: 18780160 | consumed tokens: 38461767680 | elapsed time per iteration (s): 0.45 | learning rate: 7.345E-05 | global batch size: 256 | lm loss: 2.241911E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.998 | TFLOPs: 30.17 | 7: iteration 73370/ 115203 | consumed samples: 18782720 | consumed tokens: 38467010560 | elapsed time per iteration (s): 0.43 | learning rate: 7.343E-05 | global batch size: 256 | lm loss: 2.241898E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.151 | TFLOPs: 31.23 | 7: iteration 73380/ 115203 | consumed samples: 18785280 | consumed tokens: 38472253440 | elapsed time per iteration (s): 0.43 | learning rate: 7.340E-05 | global batch size: 256 | lm loss: 2.250988E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.865 | TFLOPs: 31.26 | 7: iteration 73390/ 115203 | consumed samples: 18787840 | consumed tokens: 38477496320 | elapsed time per iteration (s): 0.43 | learning rate: 7.338E-05 | global batch size: 256 | lm loss: 2.241722E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.816 | TFLOPs: 31.05 | 7: iteration 73400/ 115203 | consumed samples: 18790400 | consumed tokens: 38482739200 | elapsed time per iteration (s): 0.43 | learning rate: 7.336E-05 | global batch size: 256 | lm loss: 2.256739E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.188 | TFLOPs: 31.12 | 7: iteration 73410/ 115203 | consumed samples: 18792960 | consumed tokens: 38487982080 | elapsed time per iteration (s): 0.43 | learning rate: 7.334E-05 | global batch size: 256 | lm loss: 2.252172E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.686 | TFLOPs: 31.20 | 7: iteration 73420/ 115203 | consumed samples: 18795520 | consumed tokens: 38493224960 | elapsed time per iteration (s): 0.43 | learning rate: 7.331E-05 | global batch size: 256 | lm loss: 2.261368E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.098 | TFLOPs: 30.96 | 7: iteration 73430/ 115203 | consumed samples: 18798080 | consumed tokens: 38498467840 | elapsed time per iteration (s): 0.43 | learning rate: 7.329E-05 | global batch size: 256 | lm loss: 2.276435E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.553 | TFLOPs: 31.35 | 7: iteration 73440/ 115203 | consumed samples: 18800640 | consumed tokens: 38503710720 | elapsed time per iteration (s): 0.43 | learning rate: 7.327E-05 | global batch size: 256 | lm loss: 2.251892E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.848 | TFLOPs: 31.32 | 7: iteration 73450/ 115203 | consumed samples: 18803200 | consumed tokens: 38508953600 | elapsed time per iteration (s): 0.42 | learning rate: 7.325E-05 | global batch size: 256 | lm loss: 2.232653E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.116 | TFLOPs: 31.70 | 7: iteration 73460/ 115203 | consumed samples: 18805760 | consumed tokens: 38514196480 | elapsed time per iteration (s): 0.44 | learning rate: 7.322E-05 | global batch size: 256 | lm loss: 2.224687E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.544 | TFLOPs: 30.30 | 7: iteration 73470/ 115203 | consumed samples: 18808320 | consumed tokens: 38519439360 | elapsed time per iteration (s): 0.43 | learning rate: 7.320E-05 | global batch size: 256 | lm loss: 2.249684E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.497 | TFLOPs: 31.40 | 7: iteration 73480/ 115203 | consumed samples: 18810880 | consumed tokens: 38524682240 | elapsed time per iteration (s): 0.43 | learning rate: 7.318E-05 | global batch size: 256 | lm loss: 2.241661E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.003 | TFLOPs: 31.53 | 7: iteration 73490/ 115203 | consumed samples: 18813440 | consumed tokens: 38529925120 | elapsed time per iteration (s): 0.43 | learning rate: 7.316E-05 | global batch size: 256 | lm loss: 2.249553E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.087 | TFLOPs: 31.28 | 7: iteration 73500/ 115203 | consumed samples: 18816000 | consumed tokens: 38535168000 | elapsed time per iteration (s): 0.43 | learning rate: 7.313E-05 | global batch size: 256 | lm loss: 2.268935E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.026 | TFLOPs: 31.43 | 7: iteration 73510/ 115203 | consumed samples: 18818560 | consumed tokens: 38540410880 | elapsed time per iteration (s): 0.43 | learning rate: 7.311E-05 | global batch size: 256 | lm loss: 2.264472E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.957 | TFLOPs: 31.27 | 7: iteration 73520/ 115203 | consumed samples: 18821120 | consumed tokens: 38545653760 | elapsed time per iteration (s): 0.43 | learning rate: 7.309E-05 | global batch size: 256 | lm loss: 2.280127E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.919 | TFLOPs: 31.16 | 7: iteration 73530/ 115203 | consumed samples: 18823680 | consumed tokens: 38550896640 | elapsed time per iteration (s): 0.44 | learning rate: 7.307E-05 | global batch size: 256 | lm loss: 2.257851E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.065 | TFLOPs: 30.44 | 7: iteration 73540/ 115203 | consumed samples: 18826240 | consumed tokens: 38556139520 | elapsed time per iteration (s): 0.43 | learning rate: 7.304E-05 | global batch size: 256 | lm loss: 2.263651E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.522 | TFLOPs: 31.04 | 7: iteration 73550/ 115203 | consumed samples: 18828800 | consumed tokens: 38561382400 | elapsed time per iteration (s): 0.44 | learning rate: 7.302E-05 | global batch size: 256 | lm loss: 2.303712E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.876 | TFLOPs: 30.58 | 7: iteration 73560/ 115203 | consumed samples: 18831360 | consumed tokens: 38566625280 | elapsed time per iteration (s): 0.43 | learning rate: 7.300E-05 | global batch size: 256 | lm loss: 2.265479E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.289 | TFLOPs: 31.34 | 7: iteration 73570/ 115203 | consumed samples: 18833920 | consumed tokens: 38571868160 | elapsed time per iteration (s): 0.44 | learning rate: 7.297E-05 | global batch size: 256 | lm loss: 2.218557E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.135 | TFLOPs: 30.28 | 7: iteration 73580/ 115203 | consumed samples: 18836480 | consumed tokens: 38577111040 | elapsed time per iteration (s): 0.43 | learning rate: 7.295E-05 | global batch size: 256 | lm loss: 2.238968E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.235 | TFLOPs: 30.92 | 7: iteration 73590/ 115203 | consumed samples: 18839040 | consumed tokens: 38582353920 | elapsed time per iteration (s): 0.43 | learning rate: 7.293E-05 | global batch size: 256 | lm loss: 2.293103E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.888 | TFLOPs: 30.90 | 7: iteration 73600/ 115203 | consumed samples: 18841600 | consumed tokens: 38587596800 | elapsed time per iteration (s): 0.43 | learning rate: 7.291E-05 | global batch size: 256 | lm loss: 2.248296E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.428 | TFLOPs: 31.40 | 7: iteration 73610/ 115203 | consumed samples: 18844160 | consumed tokens: 38592839680 | elapsed time per iteration (s): 0.44 | learning rate: 7.288E-05 | global batch size: 256 | lm loss: 2.262901E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.226 | TFLOPs: 30.86 | 7: iteration 73620/ 115203 | consumed samples: 18846720 | consumed tokens: 38598082560 | elapsed time per iteration (s): 0.43 | learning rate: 7.286E-05 | global batch size: 256 | lm loss: 2.230338E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.802 | TFLOPs: 31.21 | 7: iteration 73630/ 115203 | consumed samples: 18849280 | consumed tokens: 38603325440 | elapsed time per iteration (s): 0.43 | learning rate: 7.284E-05 | global batch size: 256 | lm loss: 2.265802E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.987 | TFLOPs: 31.01 | 7: iteration 73640/ 115203 | consumed samples: 18851840 | consumed tokens: 38608568320 | elapsed time per iteration (s): 0.43 | learning rate: 7.282E-05 | global batch size: 256 | lm loss: 2.262552E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.375 | TFLOPs: 31.45 | 7: iteration 73650/ 115203 | consumed samples: 18854400 | consumed tokens: 38613811200 | elapsed time per iteration (s): 0.43 | learning rate: 7.279E-05 | global batch size: 256 | lm loss: 2.240695E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.137 | TFLOPs: 31.17 | 7: iteration 73660/ 115203 | consumed samples: 18856960 | consumed tokens: 38619054080 | elapsed time per iteration (s): 0.43 | learning rate: 7.277E-05 | global batch size: 256 | lm loss: 2.299544E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.667 | TFLOPs: 31.52 | 7: iteration 73670/ 115203 | consumed samples: 18859520 | consumed tokens: 38624296960 | elapsed time per iteration (s): 0.43 | learning rate: 7.275E-05 | global batch size: 256 | lm loss: 2.245122E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.167 | TFLOPs: 31.38 | 7: iteration 73680/ 115203 | consumed samples: 18862080 | consumed tokens: 38629539840 | elapsed time per iteration (s): 0.43 | learning rate: 7.273E-05 | global batch size: 256 | lm loss: 2.265343E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.340 | TFLOPs: 31.34 | 7: iteration 73690/ 115203 | consumed samples: 18864640 | consumed tokens: 38634782720 | elapsed time per iteration (s): 0.43 | learning rate: 7.270E-05 | global batch size: 256 | lm loss: 2.257816E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.085 | TFLOPs: 30.91 | 7: iteration 73700/ 115203 | consumed samples: 18867200 | consumed tokens: 38640025600 | elapsed time per iteration (s): 0.43 | learning rate: 7.268E-05 | global batch size: 256 | lm loss: 2.251695E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.976 | TFLOPs: 31.37 | 7: iteration 73710/ 115203 | consumed samples: 18869760 | consumed tokens: 38645268480 | elapsed time per iteration (s): 0.43 | learning rate: 7.266E-05 | global batch size: 256 | lm loss: 2.261648E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.296 | TFLOPs: 30.92 | 7: iteration 73720/ 115203 | consumed samples: 18872320 | consumed tokens: 38650511360 | elapsed time per iteration (s): 0.42 | learning rate: 7.264E-05 | global batch size: 256 | lm loss: 2.270829E+00 | grad norm: 0.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.575 | TFLOPs: 31.77 | 7: iteration 73730/ 115203 | consumed samples: 18874880 | consumed tokens: 38655754240 | elapsed time per iteration (s): 0.43 | learning rate: 7.261E-05 | global batch size: 256 | lm loss: 2.256429E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.970 | TFLOPs: 31.06 | 7: iteration 73740/ 115203 | consumed samples: 18877440 | consumed tokens: 38660997120 | elapsed time per iteration (s): 0.43 | learning rate: 7.259E-05 | global batch size: 256 | lm loss: 2.285182E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.768 | TFLOPs: 31.26 | 7: iteration 73750/ 115203 | consumed samples: 18880000 | consumed tokens: 38666240000 | elapsed time per iteration (s): 0.43 | learning rate: 7.257E-05 | global batch size: 256 | lm loss: 2.262738E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.022 | TFLOPs: 31.11 | 7: iteration 73760/ 115203 | consumed samples: 18882560 | consumed tokens: 38671482880 | elapsed time per iteration (s): 0.43 | learning rate: 7.255E-05 | global batch size: 256 | lm loss: 2.239662E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.967 | TFLOPs: 31.01 | 7: iteration 73770/ 115203 | consumed samples: 18885120 | consumed tokens: 38676725760 | elapsed time per iteration (s): 0.42 | learning rate: 7.252E-05 | global batch size: 256 | lm loss: 2.252404E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.786 | TFLOPs: 31.63 | 7: iteration 73780/ 115203 | consumed samples: 18887680 | consumed tokens: 38681968640 | elapsed time per iteration (s): 0.44 | learning rate: 7.250E-05 | global batch size: 256 | lm loss: 2.242694E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.926 | TFLOPs: 30.48 | 7: iteration 73790/ 115203 | consumed samples: 18890240 | consumed tokens: 38687211520 | elapsed time per iteration (s): 0.43 | learning rate: 7.248E-05 | global batch size: 256 | lm loss: 2.279407E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.966 | TFLOPs: 31.48 | 7: iteration 73800/ 115203 | consumed samples: 18892800 | consumed tokens: 38692454400 | elapsed time per iteration (s): 0.43 | learning rate: 7.246E-05 | global batch size: 256 | lm loss: 2.222480E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.789 | TFLOPs: 31.21 | 7: iteration 73810/ 115203 | consumed samples: 18895360 | consumed tokens: 38697697280 | elapsed time per iteration (s): 0.43 | learning rate: 7.243E-05 | global batch size: 256 | lm loss: 2.247013E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.281 | TFLOPs: 31.44 | 7: iteration 73820/ 115203 | consumed samples: 18897920 | consumed tokens: 38702940160 | elapsed time per iteration (s): 0.44 | learning rate: 7.241E-05 | global batch size: 256 | lm loss: 2.246761E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.106 | TFLOPs: 30.75 | 7: iteration 73830/ 115203 | consumed samples: 18900480 | consumed tokens: 38708183040 | elapsed time per iteration (s): 0.42 | learning rate: 7.239E-05 | global batch size: 256 | lm loss: 2.275252E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.580 | TFLOPs: 31.62 | 7: iteration 73840/ 115203 | consumed samples: 18903040 | consumed tokens: 38713425920 | elapsed time per iteration (s): 0.44 | learning rate: 7.237E-05 | global batch size: 256 | lm loss: 2.276585E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.206 | TFLOPs: 30.39 | 7: iteration 73850/ 115203 | consumed samples: 18905600 | consumed tokens: 38718668800 | elapsed time per iteration (s): 0.43 | learning rate: 7.234E-05 | global batch size: 256 | lm loss: 2.255573E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.197 | TFLOPs: 31.18 | 7: iteration 73860/ 115203 | consumed samples: 18908160 | consumed tokens: 38723911680 | elapsed time per iteration (s): 0.43 | learning rate: 7.232E-05 | global batch size: 256 | lm loss: 2.269517E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.483 | TFLOPs: 31.09 | 7: iteration 73870/ 115203 | consumed samples: 18910720 | consumed tokens: 38729154560 | elapsed time per iteration (s): 0.43 | learning rate: 7.230E-05 | global batch size: 256 | lm loss: 2.264288E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.246 | TFLOPs: 31.23 | 7: iteration 73880/ 115203 | consumed samples: 18913280 | consumed tokens: 38734397440 | elapsed time per iteration (s): 0.43 | learning rate: 7.228E-05 | global batch size: 256 | lm loss: 2.261210E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.867 | TFLOPs: 31.42 | 7: iteration 73890/ 115203 | consumed samples: 18915840 | consumed tokens: 38739640320 | elapsed time per iteration (s): 0.44 | learning rate: 7.225E-05 | global batch size: 256 | lm loss: 2.259237E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.769 | TFLOPs: 30.79 | 7: iteration 73900/ 115203 | consumed samples: 18918400 | consumed tokens: 38744883200 | elapsed time per iteration (s): 0.44 | learning rate: 7.223E-05 | global batch size: 256 | lm loss: 2.251040E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.944 | TFLOPs: 30.64 | 7: iteration 73910/ 115203 | consumed samples: 18920960 | consumed tokens: 38750126080 | elapsed time per iteration (s): 0.45 | learning rate: 7.221E-05 | global batch size: 256 | lm loss: 2.236450E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.087 | TFLOPs: 30.07 | 7: iteration 73920/ 115203 | consumed samples: 18923520 | consumed tokens: 38755368960 | elapsed time per iteration (s): 0.42 | learning rate: 7.219E-05 | global batch size: 256 | lm loss: 2.293171E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.292 | TFLOPs: 31.65 | 7: iteration 73930/ 115203 | consumed samples: 18926080 | consumed tokens: 38760611840 | elapsed time per iteration (s): 0.44 | learning rate: 7.216E-05 | global batch size: 256 | lm loss: 2.234517E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.222 | TFLOPs: 30.86 | 7: iteration 73940/ 115203 | consumed samples: 18928640 | consumed tokens: 38765854720 | elapsed time per iteration (s): 0.42 | learning rate: 7.214E-05 | global batch size: 256 | lm loss: 2.243101E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.934 | TFLOPs: 31.79 | 7: iteration 73950/ 115203 | consumed samples: 18931200 | consumed tokens: 38771097600 | elapsed time per iteration (s): 0.43 | learning rate: 7.212E-05 | global batch size: 256 | lm loss: 2.287126E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.831 | TFLOPs: 31.52 | 7: iteration 73960/ 115203 | consumed samples: 18933760 | consumed tokens: 38776340480 | elapsed time per iteration (s): 0.43 | learning rate: 7.210E-05 | global batch size: 256 | lm loss: 2.211985E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.641 | TFLOPs: 31.25 | 7: iteration 73970/ 115203 | consumed samples: 18936320 | consumed tokens: 38781583360 | elapsed time per iteration (s): 0.43 | learning rate: 7.207E-05 | global batch size: 256 | lm loss: 2.274445E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.095 | TFLOPs: 30.91 | 7: iteration 73980/ 115203 | consumed samples: 18938880 | consumed tokens: 38786826240 | elapsed time per iteration (s): 0.45 | learning rate: 7.205E-05 | global batch size: 256 | lm loss: 2.280045E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.712 | TFLOPs: 30.00 | 7: iteration 73990/ 115203 | consumed samples: 18941440 | consumed tokens: 38792069120 | elapsed time per iteration (s): 0.43 | learning rate: 7.203E-05 | global batch size: 256 | lm loss: 2.254101E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.504 | TFLOPs: 31.19 | 0: [2022-11-28 21:51:51,006] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=0, lr=[7.20058819630707e-05, 7.20058819630707e-05, 7.20058819630707e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 74000/ 115203 | consumed samples: 18944000 | consumed tokens: 38797312000 | elapsed time per iteration (s): 0.44 | learning rate: 7.201E-05 | global batch size: 256 | lm loss: 2.256394E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.052 | TFLOPs: 30.85 | 0: steps: 74000 loss: 2.2067 iter time (s): 0.432 samples/sec: 592.417 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 74000 | lm loss value: 2.111705E+00 | lm loss PPL: 8.262315E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 74000 to checkpoints_221m 0: [2022-11-28 21:51:51,191] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step74000 is begin to save! 0: [2022-11-28 21:51:51,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:51:51,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:51:51,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:51:51,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:51:51,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:51:51,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:51:51,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:51:51,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:51:51,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:51:51,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:51:51,432] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:51:51,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:51:51,456] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:51:51,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:51:51,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:51:51,506] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:51:51,506] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:51:51,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:51:51,529] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:51:51,553] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:51:51,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:51:51,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:51:51,579] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:51:51,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:51:51,606] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:51:51,630] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:51:51,631] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:51:51,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:51:51,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:51:51,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:51:51,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:51:51,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:51:51,706] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:51:51,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:51:51,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:51:51,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:51:51,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:51:51,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:51:51,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:51:51,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:51:51,787] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step74000/mp_rank_00_model_states.pt 0: [2022-11-28 21:51:51,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:51:51,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:51:51,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:51:51,857] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,858] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,858] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,862] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,862] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,862] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,866] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,866] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,869] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 2: [2022-11-28 21:51:51,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:51:51,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:51:51,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 1: [2022-11-28 21:51:51,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:51:51,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 4: [2022-11-28 21:51:51,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:51:51,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:51:51,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 7: [2022-11-28 21:51:51,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:51:51,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:51:51,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 21:51:51,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:51:51,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 3: [2022-11-28 21:51:51,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: [2022-11-28 21:51:51,940] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:51:51,940] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 6: [2022-11-28 21:51:52,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step74000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:51:52,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step74000 is ready now! 0: successfully saved checkpoint at iteration 74000 to checkpoints_221m 7: time (ms) | save-checkpoint: 875.49 7: iteration 74010/ 115203 | consumed samples: 18946560 | consumed tokens: 38802554880 | elapsed time per iteration (s): 0.55 | learning rate: 7.198E-05 | global batch size: 256 | lm loss: 2.262860E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 463.543 | TFLOPs: 24.32 | 7: iteration 74020/ 115203 | consumed samples: 18949120 | consumed tokens: 38807797760 | elapsed time per iteration (s): 0.43 | learning rate: 7.196E-05 | global batch size: 256 | lm loss: 2.219770E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.679 | TFLOPs: 31.15 | 7: iteration 74030/ 115203 | consumed samples: 18951680 | consumed tokens: 38813040640 | elapsed time per iteration (s): 0.44 | learning rate: 7.194E-05 | global batch size: 256 | lm loss: 2.245367E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.406 | TFLOPs: 30.87 | 7: iteration 74040/ 115203 | consumed samples: 18954240 | consumed tokens: 38818283520 | elapsed time per iteration (s): 0.45 | learning rate: 7.192E-05 | global batch size: 256 | lm loss: 2.245692E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.870 | TFLOPs: 29.90 | 7: iteration 74050/ 115203 | consumed samples: 18956800 | consumed tokens: 38823526400 | elapsed time per iteration (s): 0.43 | learning rate: 7.189E-05 | global batch size: 256 | lm loss: 2.255281E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.845 | TFLOPs: 31.05 | 7: iteration 74060/ 115203 | consumed samples: 18959360 | consumed tokens: 38828769280 | elapsed time per iteration (s): 0.63 | learning rate: 7.187E-05 | global batch size: 256 | lm loss: 2.255634E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 404.808 | TFLOPs: 21.24 | 7: iteration 74070/ 115203 | consumed samples: 18961920 | consumed tokens: 38834012160 | elapsed time per iteration (s): 0.43 | learning rate: 7.185E-05 | global batch size: 256 | lm loss: 2.252899E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.478 | TFLOPs: 31.40 | 7: iteration 74080/ 115203 | consumed samples: 18964480 | consumed tokens: 38839255040 | elapsed time per iteration (s): 0.44 | learning rate: 7.183E-05 | global batch size: 256 | lm loss: 2.226667E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.883 | TFLOPs: 30.85 | 7: iteration 74090/ 115203 | consumed samples: 18967040 | consumed tokens: 38844497920 | elapsed time per iteration (s): 0.43 | learning rate: 7.180E-05 | global batch size: 256 | lm loss: 2.275555E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.769 | TFLOPs: 30.94 | 7: iteration 74100/ 115203 | consumed samples: 18969600 | consumed tokens: 38849740800 | elapsed time per iteration (s): 0.43 | learning rate: 7.178E-05 | global batch size: 256 | lm loss: 2.280663E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.910 | TFLOPs: 31.32 | 7: iteration 74110/ 115203 | consumed samples: 18972160 | consumed tokens: 38854983680 | elapsed time per iteration (s): 0.43 | learning rate: 7.176E-05 | global batch size: 256 | lm loss: 2.230558E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.718 | TFLOPs: 30.94 | 7: iteration 74120/ 115203 | consumed samples: 18974720 | consumed tokens: 38860226560 | elapsed time per iteration (s): 0.44 | learning rate: 7.174E-05 | global batch size: 256 | lm loss: 2.244837E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.544 | TFLOPs: 30.41 | 7: iteration 74130/ 115203 | consumed samples: 18977280 | consumed tokens: 38865469440 | elapsed time per iteration (s): 0.43 | learning rate: 7.171E-05 | global batch size: 256 | lm loss: 2.248913E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.139 | TFLOPs: 31.02 | 7: iteration 74140/ 115203 | consumed samples: 18979840 | consumed tokens: 38870712320 | elapsed time per iteration (s): 0.42 | learning rate: 7.169E-05 | global batch size: 256 | lm loss: 2.232664E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.714 | TFLOPs: 31.62 | 7: iteration 74150/ 115203 | consumed samples: 18982400 | consumed tokens: 38875955200 | elapsed time per iteration (s): 0.44 | learning rate: 7.167E-05 | global batch size: 256 | lm loss: 2.261306E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.664 | TFLOPs: 30.31 | 7: iteration 74160/ 115203 | consumed samples: 18984960 | consumed tokens: 38881198080 | elapsed time per iteration (s): 0.44 | learning rate: 7.165E-05 | global batch size: 256 | lm loss: 2.255574E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.959 | TFLOPs: 30.48 | 7: iteration 74170/ 115203 | consumed samples: 18987520 | consumed tokens: 38886440960 | elapsed time per iteration (s): 0.43 | learning rate: 7.162E-05 | global batch size: 256 | lm loss: 2.264653E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.821 | TFLOPs: 31.05 | 7: iteration 74180/ 115203 | consumed samples: 18990080 | consumed tokens: 38891683840 | elapsed time per iteration (s): 0.43 | learning rate: 7.160E-05 | global batch size: 256 | lm loss: 2.260537E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.527 | TFLOPs: 30.93 | 7: iteration 74190/ 115203 | consumed samples: 18992640 | consumed tokens: 38896926720 | elapsed time per iteration (s): 0.43 | learning rate: 7.158E-05 | global batch size: 256 | lm loss: 2.267198E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.399 | TFLOPs: 31.03 | 7: iteration 74200/ 115203 | consumed samples: 18995200 | consumed tokens: 38902169600 | elapsed time per iteration (s): 0.43 | learning rate: 7.156E-05 | global batch size: 256 | lm loss: 2.241863E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.013 | TFLOPs: 31.53 | 7: iteration 74210/ 115203 | consumed samples: 18997760 | consumed tokens: 38907412480 | elapsed time per iteration (s): 0.43 | learning rate: 7.153E-05 | global batch size: 256 | lm loss: 2.265274E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.595 | TFLOPs: 30.94 | 7: iteration 74220/ 115203 | consumed samples: 19000320 | consumed tokens: 38912655360 | elapsed time per iteration (s): 0.43 | learning rate: 7.151E-05 | global batch size: 256 | lm loss: 2.280068E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.348 | TFLOPs: 31.45 | 7: iteration 74230/ 115203 | consumed samples: 19002880 | consumed tokens: 38917898240 | elapsed time per iteration (s): 0.44 | learning rate: 7.149E-05 | global batch size: 256 | lm loss: 2.248350E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.582 | TFLOPs: 30.30 | 7: iteration 74240/ 115203 | consumed samples: 19005440 | consumed tokens: 38923141120 | elapsed time per iteration (s): 0.43 | learning rate: 7.147E-05 | global batch size: 256 | lm loss: 2.266412E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.350 | TFLOPs: 30.97 | 7: iteration 74250/ 115203 | consumed samples: 19008000 | consumed tokens: 38928384000 | elapsed time per iteration (s): 0.43 | learning rate: 7.144E-05 | global batch size: 256 | lm loss: 2.220749E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.897 | TFLOPs: 31.27 | 7: iteration 74260/ 115203 | consumed samples: 19010560 | consumed tokens: 38933626880 | elapsed time per iteration (s): 0.45 | learning rate: 7.142E-05 | global batch size: 256 | lm loss: 2.278297E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.487 | TFLOPs: 30.14 | 7: iteration 74270/ 115203 | consumed samples: 19013120 | consumed tokens: 38938869760 | elapsed time per iteration (s): 0.42 | learning rate: 7.140E-05 | global batch size: 256 | lm loss: 2.286194E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.586 | TFLOPs: 31.62 | 7: iteration 74280/ 115203 | consumed samples: 19015680 | consumed tokens: 38944112640 | elapsed time per iteration (s): 0.43 | learning rate: 7.138E-05 | global batch size: 256 | lm loss: 2.247556E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.743 | TFLOPs: 31.47 | 7: iteration 74290/ 115203 | consumed samples: 19018240 | consumed tokens: 38949355520 | elapsed time per iteration (s): 0.43 | learning rate: 7.136E-05 | global batch size: 256 | lm loss: 2.238361E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.188 | TFLOPs: 31.54 | 7: iteration 74300/ 115203 | consumed samples: 19020800 | consumed tokens: 38954598400 | elapsed time per iteration (s): 0.43 | learning rate: 7.133E-05 | global batch size: 256 | lm loss: 2.273669E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.802 | TFLOPs: 30.89 | 7: iteration 74310/ 115203 | consumed samples: 19023360 | consumed tokens: 38959841280 | elapsed time per iteration (s): 0.44 | learning rate: 7.131E-05 | global batch size: 256 | lm loss: 2.242204E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.081 | TFLOPs: 30.38 | 7: iteration 74320/ 115203 | consumed samples: 19025920 | consumed tokens: 38965084160 | elapsed time per iteration (s): 0.44 | learning rate: 7.129E-05 | global batch size: 256 | lm loss: 2.254276E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.663 | TFLOPs: 30.31 | 7: iteration 74330/ 115203 | consumed samples: 19028480 | consumed tokens: 38970327040 | elapsed time per iteration (s): 0.44 | learning rate: 7.127E-05 | global batch size: 256 | lm loss: 2.254570E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.026 | TFLOPs: 30.85 | 7: iteration 74340/ 115203 | consumed samples: 19031040 | consumed tokens: 38975569920 | elapsed time per iteration (s): 0.45 | learning rate: 7.124E-05 | global batch size: 256 | lm loss: 2.226835E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.865 | TFLOPs: 30.16 | 7: iteration 74350/ 115203 | consumed samples: 19033600 | consumed tokens: 38980812800 | elapsed time per iteration (s): 0.44 | learning rate: 7.122E-05 | global batch size: 256 | lm loss: 2.282074E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.075 | TFLOPs: 30.49 | 7: iteration 74360/ 115203 | consumed samples: 19036160 | consumed tokens: 38986055680 | elapsed time per iteration (s): 0.43 | learning rate: 7.120E-05 | global batch size: 256 | lm loss: 2.252197E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.304 | TFLOPs: 30.92 | 7: iteration 74370/ 115203 | consumed samples: 19038720 | consumed tokens: 38991298560 | elapsed time per iteration (s): 0.43 | learning rate: 7.118E-05 | global batch size: 256 | lm loss: 2.269699E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.715 | TFLOPs: 31.31 | 7: iteration 74380/ 115203 | consumed samples: 19041280 | consumed tokens: 38996541440 | elapsed time per iteration (s): 0.43 | learning rate: 7.115E-05 | global batch size: 256 | lm loss: 2.215205E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.905 | TFLOPs: 31.00 | 7: iteration 74390/ 115203 | consumed samples: 19043840 | consumed tokens: 39001784320 | elapsed time per iteration (s): 0.44 | learning rate: 7.113E-05 | global batch size: 256 | lm loss: 2.252700E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.742 | TFLOPs: 30.79 | 7: iteration 74400/ 115203 | consumed samples: 19046400 | consumed tokens: 39007027200 | elapsed time per iteration (s): 0.43 | learning rate: 7.111E-05 | global batch size: 256 | lm loss: 2.272375E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.198 | TFLOPs: 31.18 | 7: iteration 74410/ 115203 | consumed samples: 19048960 | consumed tokens: 39012270080 | elapsed time per iteration (s): 0.43 | learning rate: 7.109E-05 | global batch size: 256 | lm loss: 2.253235E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.101 | TFLOPs: 30.96 | 7: iteration 74420/ 115203 | consumed samples: 19051520 | consumed tokens: 39017512960 | elapsed time per iteration (s): 0.43 | learning rate: 7.106E-05 | global batch size: 256 | lm loss: 2.246345E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.890 | TFLOPs: 31.11 | 7: iteration 74430/ 115203 | consumed samples: 19054080 | consumed tokens: 39022755840 | elapsed time per iteration (s): 0.43 | learning rate: 7.104E-05 | global batch size: 256 | lm loss: 2.261096E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.573 | TFLOPs: 30.99 | 7: iteration 74440/ 115203 | consumed samples: 19056640 | consumed tokens: 39027998720 | elapsed time per iteration (s): 0.44 | learning rate: 7.102E-05 | global batch size: 256 | lm loss: 2.254873E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.457 | TFLOPs: 30.61 | 7: iteration 74450/ 115203 | consumed samples: 19059200 | consumed tokens: 39033241600 | elapsed time per iteration (s): 0.44 | learning rate: 7.100E-05 | global batch size: 256 | lm loss: 2.259018E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.446 | TFLOPs: 30.87 | 7: iteration 74460/ 115203 | consumed samples: 19061760 | consumed tokens: 39038484480 | elapsed time per iteration (s): 0.44 | learning rate: 7.098E-05 | global batch size: 256 | lm loss: 2.231356E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.161 | TFLOPs: 30.75 | 7: iteration 74470/ 115203 | consumed samples: 19064320 | consumed tokens: 39043727360 | elapsed time per iteration (s): 0.43 | learning rate: 7.095E-05 | global batch size: 256 | lm loss: 2.270937E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.102 | TFLOPs: 31.07 | 7: iteration 74480/ 115203 | consumed samples: 19066880 | consumed tokens: 39048970240 | elapsed time per iteration (s): 0.43 | learning rate: 7.093E-05 | global batch size: 256 | lm loss: 2.260996E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.091 | TFLOPs: 30.96 | 7: iteration 74490/ 115203 | consumed samples: 19069440 | consumed tokens: 39054213120 | elapsed time per iteration (s): 0.43 | learning rate: 7.091E-05 | global batch size: 256 | lm loss: 2.252898E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.471 | TFLOPs: 30.98 | 7: iteration 74500/ 115203 | consumed samples: 19072000 | consumed tokens: 39059456000 | elapsed time per iteration (s): 0.43 | learning rate: 7.089E-05 | global batch size: 256 | lm loss: 2.273392E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.893 | TFLOPs: 31.00 | 7: iteration 74510/ 115203 | consumed samples: 19074560 | consumed tokens: 39064698880 | elapsed time per iteration (s): 0.43 | learning rate: 7.086E-05 | global batch size: 256 | lm loss: 2.259702E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.103 | TFLOPs: 31.38 | 7: iteration 74520/ 115203 | consumed samples: 19077120 | consumed tokens: 39069941760 | elapsed time per iteration (s): 0.43 | learning rate: 7.084E-05 | global batch size: 256 | lm loss: 2.282360E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.297 | TFLOPs: 31.18 | 7: iteration 74530/ 115203 | consumed samples: 19079680 | consumed tokens: 39075184640 | elapsed time per iteration (s): 0.45 | learning rate: 7.082E-05 | global batch size: 256 | lm loss: 2.257617E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.973 | TFLOPs: 29.70 | 7: iteration 74540/ 115203 | consumed samples: 19082240 | consumed tokens: 39080427520 | elapsed time per iteration (s): 0.44 | learning rate: 7.080E-05 | global batch size: 256 | lm loss: 2.244646E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.940 | TFLOPs: 30.85 | 7: iteration 74550/ 115203 | consumed samples: 19084800 | consumed tokens: 39085670400 | elapsed time per iteration (s): 0.44 | learning rate: 7.077E-05 | global batch size: 256 | lm loss: 2.245383E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.761 | TFLOPs: 30.58 | 7: iteration 74560/ 115203 | consumed samples: 19087360 | consumed tokens: 39090913280 | elapsed time per iteration (s): 0.44 | learning rate: 7.075E-05 | global batch size: 256 | lm loss: 2.258866E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.165 | TFLOPs: 30.44 | 7: iteration 74570/ 115203 | consumed samples: 19089920 | consumed tokens: 39096156160 | elapsed time per iteration (s): 0.43 | learning rate: 7.073E-05 | global batch size: 256 | lm loss: 2.241936E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.113 | TFLOPs: 30.96 | 7: iteration 74580/ 115203 | consumed samples: 19092480 | consumed tokens: 39101399040 | elapsed time per iteration (s): 0.43 | learning rate: 7.071E-05 | global batch size: 256 | lm loss: 2.248446E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.985 | TFLOPs: 31.48 | 7: iteration 74590/ 115203 | consumed samples: 19095040 | consumed tokens: 39106641920 | elapsed time per iteration (s): 0.44 | learning rate: 7.069E-05 | global batch size: 256 | lm loss: 2.250162E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.416 | TFLOPs: 30.77 | 7: iteration 74600/ 115203 | consumed samples: 19097600 | consumed tokens: 39111884800 | elapsed time per iteration (s): 0.43 | learning rate: 7.066E-05 | global batch size: 256 | lm loss: 2.255449E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.810 | TFLOPs: 31.42 | 7: iteration 74610/ 115203 | consumed samples: 19100160 | consumed tokens: 39117127680 | elapsed time per iteration (s): 0.45 | learning rate: 7.064E-05 | global batch size: 256 | lm loss: 2.262776E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.850 | TFLOPs: 29.58 | 7: iteration 74620/ 115203 | consumed samples: 19102720 | consumed tokens: 39122370560 | elapsed time per iteration (s): 0.43 | learning rate: 7.062E-05 | global batch size: 256 | lm loss: 2.248016E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.032 | TFLOPs: 31.38 | 7: iteration 74630/ 115203 | consumed samples: 19105280 | consumed tokens: 39127613440 | elapsed time per iteration (s): 0.43 | learning rate: 7.060E-05 | global batch size: 256 | lm loss: 2.232491E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.566 | TFLOPs: 31.20 | 7: iteration 74640/ 115203 | consumed samples: 19107840 | consumed tokens: 39132856320 | elapsed time per iteration (s): 0.44 | learning rate: 7.057E-05 | global batch size: 256 | lm loss: 2.227228E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.269 | TFLOPs: 30.60 | 7: iteration 74650/ 115203 | consumed samples: 19110400 | consumed tokens: 39138099200 | elapsed time per iteration (s): 0.44 | learning rate: 7.055E-05 | global batch size: 256 | lm loss: 2.268689E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.516 | TFLOPs: 30.67 | 7: iteration 74660/ 115203 | consumed samples: 19112960 | consumed tokens: 39143342080 | elapsed time per iteration (s): 0.43 | learning rate: 7.053E-05 | global batch size: 256 | lm loss: 2.225615E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.352 | TFLOPs: 31.39 | 7: iteration 74670/ 115203 | consumed samples: 19115520 | consumed tokens: 39148584960 | elapsed time per iteration (s): 0.44 | learning rate: 7.051E-05 | global batch size: 256 | lm loss: 2.282271E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.472 | TFLOPs: 30.61 | 7: iteration 74680/ 115203 | consumed samples: 19118080 | consumed tokens: 39153827840 | elapsed time per iteration (s): 0.43 | learning rate: 7.048E-05 | global batch size: 256 | lm loss: 2.240652E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.989 | TFLOPs: 31.11 | 7: iteration 74690/ 115203 | consumed samples: 19120640 | consumed tokens: 39159070720 | elapsed time per iteration (s): 0.43 | learning rate: 7.046E-05 | global batch size: 256 | lm loss: 2.243530E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.126 | TFLOPs: 31.17 | 7: iteration 74700/ 115203 | consumed samples: 19123200 | consumed tokens: 39164313600 | elapsed time per iteration (s): 0.43 | learning rate: 7.044E-05 | global batch size: 256 | lm loss: 2.254083E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.555 | TFLOPs: 30.99 | 7: iteration 74710/ 115203 | consumed samples: 19125760 | consumed tokens: 39169556480 | elapsed time per iteration (s): 0.43 | learning rate: 7.042E-05 | global batch size: 256 | lm loss: 2.264280E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.961 | TFLOPs: 31.37 | 7: iteration 74720/ 115203 | consumed samples: 19128320 | consumed tokens: 39174799360 | elapsed time per iteration (s): 0.43 | learning rate: 7.040E-05 | global batch size: 256 | lm loss: 2.280935E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.177 | TFLOPs: 30.97 | 7: iteration 74730/ 115203 | consumed samples: 19130880 | consumed tokens: 39180042240 | elapsed time per iteration (s): 0.44 | learning rate: 7.037E-05 | global batch size: 256 | lm loss: 2.231873E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.962 | TFLOPs: 30.59 | 7: iteration 74740/ 115203 | consumed samples: 19133440 | consumed tokens: 39185285120 | elapsed time per iteration (s): 0.43 | learning rate: 7.035E-05 | global batch size: 256 | lm loss: 2.235301E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.949 | TFLOPs: 31.22 | 7: iteration 74750/ 115203 | consumed samples: 19136000 | consumed tokens: 39190528000 | elapsed time per iteration (s): 0.45 | learning rate: 7.033E-05 | global batch size: 256 | lm loss: 2.238597E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.469 | TFLOPs: 30.04 | 7: iteration 74760/ 115203 | consumed samples: 19138560 | consumed tokens: 39195770880 | elapsed time per iteration (s): 0.44 | learning rate: 7.031E-05 | global batch size: 256 | lm loss: 2.256377E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.055 | TFLOPs: 30.54 | 7: iteration 74770/ 115203 | consumed samples: 19141120 | consumed tokens: 39201013760 | elapsed time per iteration (s): 0.43 | learning rate: 7.028E-05 | global batch size: 256 | lm loss: 2.251549E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.360 | TFLOPs: 31.39 | 7: iteration 74780/ 115203 | consumed samples: 19143680 | consumed tokens: 39206256640 | elapsed time per iteration (s): 0.43 | learning rate: 7.026E-05 | global batch size: 256 | lm loss: 2.247256E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.822 | TFLOPs: 31.26 | 7: iteration 74790/ 115203 | consumed samples: 19146240 | consumed tokens: 39211499520 | elapsed time per iteration (s): 0.43 | learning rate: 7.024E-05 | global batch size: 256 | lm loss: 2.256718E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.635 | TFLOPs: 31.57 | 7: iteration 74800/ 115203 | consumed samples: 19148800 | consumed tokens: 39216742400 | elapsed time per iteration (s): 0.43 | learning rate: 7.022E-05 | global batch size: 256 | lm loss: 2.239687E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.177 | TFLOPs: 31.07 | 7: iteration 74810/ 115203 | consumed samples: 19151360 | consumed tokens: 39221985280 | elapsed time per iteration (s): 0.43 | learning rate: 7.020E-05 | global batch size: 256 | lm loss: 2.255530E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.839 | TFLOPs: 31.00 | 7: iteration 74820/ 115203 | consumed samples: 19153920 | consumed tokens: 39227228160 | elapsed time per iteration (s): 0.43 | learning rate: 7.017E-05 | global batch size: 256 | lm loss: 2.248166E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.726 | TFLOPs: 30.94 | 7: iteration 74830/ 115203 | consumed samples: 19156480 | consumed tokens: 39232471040 | elapsed time per iteration (s): 0.43 | learning rate: 7.015E-05 | global batch size: 256 | lm loss: 2.269695E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.576 | TFLOPs: 31.30 | 7: iteration 74840/ 115203 | consumed samples: 19159040 | consumed tokens: 39237713920 | elapsed time per iteration (s): 0.43 | learning rate: 7.013E-05 | global batch size: 256 | lm loss: 2.267014E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.685 | TFLOPs: 30.89 | 7: iteration 74850/ 115203 | consumed samples: 19161600 | consumed tokens: 39242956800 | elapsed time per iteration (s): 0.43 | learning rate: 7.011E-05 | global batch size: 256 | lm loss: 2.242057E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.362 | TFLOPs: 30.98 | 7: iteration 74860/ 115203 | consumed samples: 19164160 | consumed tokens: 39248199680 | elapsed time per iteration (s): 0.43 | learning rate: 7.008E-05 | global batch size: 256 | lm loss: 2.232960E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.703 | TFLOPs: 31.15 | 7: iteration 74870/ 115203 | consumed samples: 19166720 | consumed tokens: 39253442560 | elapsed time per iteration (s): 0.44 | learning rate: 7.006E-05 | global batch size: 256 | lm loss: 2.234521E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.695 | TFLOPs: 30.21 | 7: iteration 74880/ 115203 | consumed samples: 19169280 | consumed tokens: 39258685440 | elapsed time per iteration (s): 0.43 | learning rate: 7.004E-05 | global batch size: 256 | lm loss: 2.256455E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.031 | TFLOPs: 31.54 | 7: iteration 74890/ 115203 | consumed samples: 19171840 | consumed tokens: 39263928320 | elapsed time per iteration (s): 0.43 | learning rate: 7.002E-05 | global batch size: 256 | lm loss: 2.258747E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.663 | TFLOPs: 31.52 | 7: iteration 74900/ 115203 | consumed samples: 19174400 | consumed tokens: 39269171200 | elapsed time per iteration (s): 0.45 | learning rate: 7.000E-05 | global batch size: 256 | lm loss: 2.264639E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.523 | TFLOPs: 29.67 | 7: iteration 74910/ 115203 | consumed samples: 19176960 | consumed tokens: 39274414080 | elapsed time per iteration (s): 0.43 | learning rate: 6.997E-05 | global batch size: 256 | lm loss: 2.284438E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.911 | TFLOPs: 31.11 | 7: iteration 74920/ 115203 | consumed samples: 19179520 | consumed tokens: 39279656960 | elapsed time per iteration (s): 0.43 | learning rate: 6.995E-05 | global batch size: 256 | lm loss: 2.308406E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.835 | TFLOPs: 31.26 | 7: iteration 74930/ 115203 | consumed samples: 19182080 | consumed tokens: 39284899840 | elapsed time per iteration (s): 0.42 | learning rate: 6.993E-05 | global batch size: 256 | lm loss: 2.252734E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.431 | TFLOPs: 31.61 | 7: iteration 74940/ 115203 | consumed samples: 19184640 | consumed tokens: 39290142720 | elapsed time per iteration (s): 0.45 | learning rate: 6.991E-05 | global batch size: 256 | lm loss: 2.283730E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.696 | TFLOPs: 30.15 | 7: iteration 74950/ 115203 | consumed samples: 19187200 | consumed tokens: 39295385600 | elapsed time per iteration (s): 0.43 | learning rate: 6.988E-05 | global batch size: 256 | lm loss: 2.225027E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.086 | TFLOPs: 31.59 | 7: iteration 74960/ 115203 | consumed samples: 19189760 | consumed tokens: 39300628480 | elapsed time per iteration (s): 0.45 | learning rate: 6.986E-05 | global batch size: 256 | lm loss: 2.261199E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.892 | TFLOPs: 30.06 | 7: iteration 74970/ 115203 | consumed samples: 19192320 | consumed tokens: 39305871360 | elapsed time per iteration (s): 0.43 | learning rate: 6.984E-05 | global batch size: 256 | lm loss: 2.292167E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.511 | TFLOPs: 31.25 | 7: iteration 74980/ 115203 | consumed samples: 19194880 | consumed tokens: 39311114240 | elapsed time per iteration (s): 0.43 | learning rate: 6.982E-05 | global batch size: 256 | lm loss: 2.297743E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.168 | TFLOPs: 31.12 | 7: iteration 74990/ 115203 | consumed samples: 19197440 | consumed tokens: 39316357120 | elapsed time per iteration (s): 0.43 | learning rate: 6.980E-05 | global batch size: 256 | lm loss: 2.281525E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.505 | TFLOPs: 31.30 | 7: iteration 75000/ 115203 | consumed samples: 19200000 | consumed tokens: 39321600000 | elapsed time per iteration (s): 0.44 | learning rate: 6.977E-05 | global batch size: 256 | lm loss: 2.274919E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.926 | TFLOPs: 30.27 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 75000 | lm loss value: 2.323630E+00 | lm loss PPL: 1.021268E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 75000 to checkpoints_221m 0: [2022-11-28 21:59:08,782] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step75000 is begin to save! 0: [2022-11-28 21:59:08,786] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_01-model_00-model_states.pt... 0: [2022-11-28 21:59:08,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_01-model_00-model_states.pt. 0: [2022-11-28 21:59:08,905] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_03-model_00-model_states.pt... 0: [2022-11-28 21:59:08,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_03-model_00-model_states.pt. 0: [2022-11-28 21:59:08,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_04-model_00-model_states.pt... 0: [2022-11-28 21:59:08,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_04-model_00-model_states.pt. 0: [2022-11-28 21:59:08,952] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_05-model_00-model_states.pt... 0: [2022-11-28 21:59:08,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_05-model_00-model_states.pt. 0: [2022-11-28 21:59:08,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_06-model_00-model_states.pt... 0: [2022-11-28 21:59:09,001] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_06-model_00-model_states.pt. 0: [2022-11-28 21:59:09,001] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_07-model_00-model_states.pt... 0: [2022-11-28 21:59:09,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_07-model_00-model_states.pt. 0: [2022-11-28 21:59:09,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_08-model_00-model_states.pt... 0: [2022-11-28 21:59:09,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_08-model_00-model_states.pt. 0: [2022-11-28 21:59:09,050] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_09-model_00-model_states.pt... 0: [2022-11-28 21:59:09,074] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_09-model_00-model_states.pt. 0: [2022-11-28 21:59:09,074] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_10-model_00-model_states.pt... 0: [2022-11-28 21:59:09,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_10-model_00-model_states.pt. 0: [2022-11-28 21:59:09,099] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_11-model_00-model_states.pt... 0: [2022-11-28 21:59:09,122] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_11-model_00-model_states.pt. 0: [2022-11-28 21:59:09,122] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_12-model_00-model_states.pt... 0: [2022-11-28 21:59:09,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_12-model_00-model_states.pt. 0: [2022-11-28 21:59:09,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_13-model_00-model_states.pt... 0: [2022-11-28 21:59:09,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_13-model_00-model_states.pt. 0: [2022-11-28 21:59:09,172] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_14-model_00-model_states.pt... 0: [2022-11-28 21:59:09,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_14-model_00-model_states.pt. 0: [2022-11-28 21:59:09,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_15-model_00-model_states.pt... 0: [2022-11-28 21:59:09,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_15-model_00-model_states.pt. 0: [2022-11-28 21:59:09,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_16-model_00-model_states.pt... 0: [2022-11-28 21:59:09,245] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_16-model_00-model_states.pt. 0: [2022-11-28 21:59:09,245] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_17-model_00-model_states.pt... 0: [2022-11-28 21:59:09,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_17-model_00-model_states.pt. 0: [2022-11-28 21:59:09,269] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_18-model_00-model_states.pt... 0: [2022-11-28 21:59:09,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_18-model_00-model_states.pt. 0: [2022-11-28 21:59:09,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_19-model_00-model_states.pt... 0: [2022-11-28 21:59:09,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_19-model_00-model_states.pt. 0: [2022-11-28 21:59:09,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_20-model_00-model_states.pt... 0: [2022-11-28 21:59:09,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_20-model_00-model_states.pt. 0: [2022-11-28 21:59:09,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/layer_22-model_00-model_states.pt... 0: [2022-11-28 21:59:09,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/layer_22-model_00-model_states.pt. 0: [2022-11-28 21:59:09,345] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step75000/mp_rank_00_model_states.pt 0: [2022-11-28 21:59:09,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/mp_rank_00_model_states.pt... 0: [2022-11-28 21:59:09,347] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/mp_rank_00_model_states.pt. 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 21:59:09,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 21:59:09,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,412] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,412] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 1: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 4: [2022-11-28 21:59:09,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 21:59:09,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 21:59:09,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 21:59:09,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 7: [2022-11-28 21:59:09,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,421] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,421] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,421] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,425] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 21:59:09,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 21:59:09,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 21:59:09,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 5: [2022-11-28 21:59:09,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 21:59:09,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,445] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,445] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 21:59:09,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 6: [2022-11-28 21:59:09,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 2: [2022-11-28 21:59:09,451] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: [2022-11-28 21:59:09,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step75000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 21:59:09,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step75000 is ready now! 0: successfully saved checkpoint at iteration 75000 to checkpoints_221m 7: time (ms) | save-checkpoint: 719.57 7: iteration 75010/ 115203 | consumed samples: 19202560 | consumed tokens: 39326842880 | elapsed time per iteration (s): 0.52 | learning rate: 6.975E-05 | global batch size: 256 | lm loss: 2.241755E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.859 | TFLOPs: 25.70 | 7: iteration 75020/ 115203 | consumed samples: 19205120 | consumed tokens: 39332085760 | elapsed time per iteration (s): 0.42 | learning rate: 6.973E-05 | global batch size: 256 | lm loss: 2.200852E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.459 | TFLOPs: 31.77 | 7: iteration 75030/ 115203 | consumed samples: 19207680 | consumed tokens: 39337328640 | elapsed time per iteration (s): 0.43 | learning rate: 6.971E-05 | global batch size: 256 | lm loss: 2.249905E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.303 | TFLOPs: 31.60 | 7: iteration 75040/ 115203 | consumed samples: 19210240 | consumed tokens: 39342571520 | elapsed time per iteration (s): 0.44 | learning rate: 6.968E-05 | global batch size: 256 | lm loss: 2.260423E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.892 | TFLOPs: 30.58 | 7: iteration 75050/ 115203 | consumed samples: 19212800 | consumed tokens: 39347814400 | elapsed time per iteration (s): 0.44 | learning rate: 6.966E-05 | global batch size: 256 | lm loss: 2.242489E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.182 | TFLOPs: 30.70 | 7: iteration 75060/ 115203 | consumed samples: 19215360 | consumed tokens: 39353057280 | elapsed time per iteration (s): 0.64 | learning rate: 6.964E-05 | global batch size: 256 | lm loss: 2.236724E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 397.792 | TFLOPs: 20.87 | 7: iteration 75070/ 115203 | consumed samples: 19217920 | consumed tokens: 39358300160 | elapsed time per iteration (s): 0.43 | learning rate: 6.962E-05 | global batch size: 256 | lm loss: 2.260089E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.938 | TFLOPs: 31.48 | 7: iteration 75080/ 115203 | consumed samples: 19220480 | consumed tokens: 39363543040 | elapsed time per iteration (s): 0.43 | learning rate: 6.960E-05 | global batch size: 256 | lm loss: 2.231446E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.483 | TFLOPs: 31.56 | 7: iteration 75090/ 115203 | consumed samples: 19223040 | consumed tokens: 39368785920 | elapsed time per iteration (s): 0.43 | learning rate: 6.957E-05 | global batch size: 256 | lm loss: 2.230969E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.015 | TFLOPs: 31.17 | 7: iteration 75100/ 115203 | consumed samples: 19225600 | consumed tokens: 39374028800 | elapsed time per iteration (s): 0.43 | learning rate: 6.955E-05 | global batch size: 256 | lm loss: 2.236676E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.551 | TFLOPs: 30.93 | 7: iteration 75110/ 115203 | consumed samples: 19228160 | consumed tokens: 39379271680 | elapsed time per iteration (s): 0.44 | learning rate: 6.953E-05 | global batch size: 256 | lm loss: 2.241038E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.738 | TFLOPs: 30.84 | 7: iteration 75120/ 115203 | consumed samples: 19230720 | consumed tokens: 39384514560 | elapsed time per iteration (s): 0.43 | learning rate: 6.951E-05 | global batch size: 256 | lm loss: 2.222897E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.459 | TFLOPs: 31.51 | 7: iteration 75130/ 115203 | consumed samples: 19233280 | consumed tokens: 39389757440 | elapsed time per iteration (s): 0.43 | learning rate: 6.949E-05 | global batch size: 256 | lm loss: 2.263805E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.971 | TFLOPs: 31.16 | 7: iteration 75140/ 115203 | consumed samples: 19235840 | consumed tokens: 39395000320 | elapsed time per iteration (s): 0.43 | learning rate: 6.946E-05 | global batch size: 256 | lm loss: 2.246347E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.816 | TFLOPs: 31.05 | 7: iteration 75150/ 115203 | consumed samples: 19238400 | consumed tokens: 39400243200 | elapsed time per iteration (s): 0.42 | learning rate: 6.944E-05 | global batch size: 256 | lm loss: 2.262365E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.220 | TFLOPs: 31.70 | 7: iteration 75160/ 115203 | consumed samples: 19240960 | consumed tokens: 39405486080 | elapsed time per iteration (s): 0.43 | learning rate: 6.942E-05 | global batch size: 256 | lm loss: 2.254256E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.200 | TFLOPs: 30.91 | 7: iteration 75170/ 115203 | consumed samples: 19243520 | consumed tokens: 39410728960 | elapsed time per iteration (s): 0.43 | learning rate: 6.940E-05 | global batch size: 256 | lm loss: 2.245797E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.293 | TFLOPs: 31.18 | 7: iteration 75180/ 115203 | consumed samples: 19246080 | consumed tokens: 39415971840 | elapsed time per iteration (s): 0.42 | learning rate: 6.937E-05 | global batch size: 256 | lm loss: 2.279423E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.208 | TFLOPs: 31.86 | 7: iteration 75190/ 115203 | consumed samples: 19248640 | consumed tokens: 39421214720 | elapsed time per iteration (s): 0.43 | learning rate: 6.935E-05 | global batch size: 256 | lm loss: 2.237968E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.480 | TFLOPs: 31.40 | 7: iteration 75200/ 115203 | consumed samples: 19251200 | consumed tokens: 39426457600 | elapsed time per iteration (s): 0.43 | learning rate: 6.933E-05 | global batch size: 256 | lm loss: 2.272563E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.071 | TFLOPs: 31.17 | 7: iteration 75210/ 115203 | consumed samples: 19253760 | consumed tokens: 39431700480 | elapsed time per iteration (s): 0.43 | learning rate: 6.931E-05 | global batch size: 256 | lm loss: 2.284981E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.313 | TFLOPs: 31.18 | 7: iteration 75220/ 115203 | consumed samples: 19256320 | consumed tokens: 39436943360 | elapsed time per iteration (s): 0.44 | learning rate: 6.929E-05 | global batch size: 256 | lm loss: 2.285560E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.819 | TFLOPs: 30.68 | 7: iteration 75230/ 115203 | consumed samples: 19258880 | consumed tokens: 39442186240 | elapsed time per iteration (s): 0.43 | learning rate: 6.926E-05 | global batch size: 256 | lm loss: 2.270302E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.780 | TFLOPs: 31.36 | 7: iteration 75240/ 115203 | consumed samples: 19261440 | consumed tokens: 39447429120 | elapsed time per iteration (s): 0.43 | learning rate: 6.924E-05 | global batch size: 256 | lm loss: 2.273689E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.679 | TFLOPs: 30.89 | 7: iteration 75250/ 115203 | consumed samples: 19264000 | consumed tokens: 39452672000 | elapsed time per iteration (s): 0.43 | learning rate: 6.922E-05 | global batch size: 256 | lm loss: 2.255646E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.926 | TFLOPs: 31.11 | 7: iteration 75260/ 115203 | consumed samples: 19266560 | consumed tokens: 39457914880 | elapsed time per iteration (s): 0.43 | learning rate: 6.920E-05 | global batch size: 256 | lm loss: 2.238008E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.627 | TFLOPs: 31.20 | 7: iteration 75270/ 115203 | consumed samples: 19269120 | consumed tokens: 39463157760 | elapsed time per iteration (s): 0.44 | learning rate: 6.918E-05 | global batch size: 256 | lm loss: 2.254058E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.580 | TFLOPs: 30.67 | 7: iteration 75280/ 115203 | consumed samples: 19271680 | consumed tokens: 39468400640 | elapsed time per iteration (s): 0.44 | learning rate: 6.915E-05 | global batch size: 256 | lm loss: 2.260571E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.122 | TFLOPs: 30.70 | 7: iteration 75290/ 115203 | consumed samples: 19274240 | consumed tokens: 39473643520 | elapsed time per iteration (s): 0.43 | learning rate: 6.913E-05 | global batch size: 256 | lm loss: 2.270454E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.915 | TFLOPs: 31.16 | 7: iteration 75300/ 115203 | consumed samples: 19276800 | consumed tokens: 39478886400 | elapsed time per iteration (s): 0.44 | learning rate: 6.911E-05 | global batch size: 256 | lm loss: 2.284163E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.982 | TFLOPs: 30.64 | 7: iteration 75310/ 115203 | consumed samples: 19279360 | consumed tokens: 39484129280 | elapsed time per iteration (s): 0.44 | learning rate: 6.909E-05 | global batch size: 256 | lm loss: 2.221547E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.757 | TFLOPs: 30.84 | 7: iteration 75320/ 115203 | consumed samples: 19281920 | consumed tokens: 39489372160 | elapsed time per iteration (s): 0.43 | learning rate: 6.907E-05 | global batch size: 256 | lm loss: 2.223383E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.142 | TFLOPs: 31.17 | 7: iteration 75330/ 115203 | consumed samples: 19284480 | consumed tokens: 39494615040 | elapsed time per iteration (s): 0.42 | learning rate: 6.904E-05 | global batch size: 256 | lm loss: 2.241611E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.612 | TFLOPs: 31.67 | 7: iteration 75340/ 115203 | consumed samples: 19287040 | consumed tokens: 39499857920 | elapsed time per iteration (s): 0.43 | learning rate: 6.902E-05 | global batch size: 256 | lm loss: 2.225098E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.094 | TFLOPs: 31.28 | 7: iteration 75350/ 115203 | consumed samples: 19289600 | consumed tokens: 39505100800 | elapsed time per iteration (s): 0.43 | learning rate: 6.900E-05 | global batch size: 256 | lm loss: 2.252782E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.904 | TFLOPs: 31.32 | 7: iteration 75360/ 115203 | consumed samples: 19292160 | consumed tokens: 39510343680 | elapsed time per iteration (s): 0.43 | learning rate: 6.898E-05 | global batch size: 256 | lm loss: 2.211612E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.893 | TFLOPs: 31.53 | 7: iteration 75370/ 115203 | consumed samples: 19294720 | consumed tokens: 39515586560 | elapsed time per iteration (s): 0.44 | learning rate: 6.895E-05 | global batch size: 256 | lm loss: 2.246121E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.968 | TFLOPs: 30.69 | 7: iteration 75380/ 115203 | consumed samples: 19297280 | consumed tokens: 39520829440 | elapsed time per iteration (s): 0.44 | learning rate: 6.893E-05 | global batch size: 256 | lm loss: 2.241650E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.683 | TFLOPs: 30.57 | 7: iteration 75390/ 115203 | consumed samples: 19299840 | consumed tokens: 39526072320 | elapsed time per iteration (s): 0.43 | learning rate: 6.891E-05 | global batch size: 256 | lm loss: 2.262852E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.703 | TFLOPs: 31.47 | 7: iteration 75400/ 115203 | consumed samples: 19302400 | consumed tokens: 39531315200 | elapsed time per iteration (s): 0.43 | learning rate: 6.889E-05 | global batch size: 256 | lm loss: 2.259414E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.128 | TFLOPs: 31.44 | 7: iteration 75410/ 115203 | consumed samples: 19304960 | consumed tokens: 39536558080 | elapsed time per iteration (s): 0.43 | learning rate: 6.887E-05 | global batch size: 256 | lm loss: 2.260823E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.574 | TFLOPs: 31.14 | 7: iteration 75420/ 115203 | consumed samples: 19307520 | consumed tokens: 39541800960 | elapsed time per iteration (s): 0.43 | learning rate: 6.884E-05 | global batch size: 256 | lm loss: 2.255943E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.555 | TFLOPs: 31.46 | 7: iteration 75430/ 115203 | consumed samples: 19310080 | consumed tokens: 39547043840 | elapsed time per iteration (s): 0.43 | learning rate: 6.882E-05 | global batch size: 256 | lm loss: 2.237303E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.249 | TFLOPs: 31.28 | 7: iteration 75440/ 115203 | consumed samples: 19312640 | consumed tokens: 39552286720 | elapsed time per iteration (s): 0.44 | learning rate: 6.880E-05 | global batch size: 256 | lm loss: 2.261361E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.103 | TFLOPs: 30.59 | 7: iteration 75450/ 115203 | consumed samples: 19315200 | consumed tokens: 39557529600 | elapsed time per iteration (s): 0.43 | learning rate: 6.878E-05 | global batch size: 256 | lm loss: 2.256083E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.437 | TFLOPs: 31.35 | 7: iteration 75460/ 115203 | consumed samples: 19317760 | consumed tokens: 39562772480 | elapsed time per iteration (s): 0.43 | learning rate: 6.876E-05 | global batch size: 256 | lm loss: 2.242478E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.696 | TFLOPs: 31.15 | 7: iteration 75470/ 115203 | consumed samples: 19320320 | consumed tokens: 39568015360 | elapsed time per iteration (s): 0.43 | learning rate: 6.873E-05 | global batch size: 256 | lm loss: 2.247145E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.200 | TFLOPs: 31.33 | 7: iteration 75480/ 115203 | consumed samples: 19322880 | consumed tokens: 39573258240 | elapsed time per iteration (s): 0.44 | learning rate: 6.871E-05 | global batch size: 256 | lm loss: 2.215958E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.887 | TFLOPs: 30.37 | 7: iteration 75490/ 115203 | consumed samples: 19325440 | consumed tokens: 39578501120 | elapsed time per iteration (s): 0.43 | learning rate: 6.869E-05 | global batch size: 256 | lm loss: 2.260281E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.617 | TFLOPs: 31.30 | 7: iteration 75500/ 115203 | consumed samples: 19328000 | consumed tokens: 39583744000 | elapsed time per iteration (s): 0.43 | learning rate: 6.867E-05 | global batch size: 256 | lm loss: 2.260486E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.143 | TFLOPs: 31.33 | 7: iteration 75510/ 115203 | consumed samples: 19330560 | consumed tokens: 39588986880 | elapsed time per iteration (s): 0.43 | learning rate: 6.865E-05 | global batch size: 256 | lm loss: 2.248948E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.028 | TFLOPs: 31.12 | 7: iteration 75520/ 115203 | consumed samples: 19333120 | consumed tokens: 39594229760 | elapsed time per iteration (s): 0.43 | learning rate: 6.862E-05 | global batch size: 256 | lm loss: 2.248975E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.229 | TFLOPs: 31.02 | 7: iteration 75530/ 115203 | consumed samples: 19335680 | consumed tokens: 39599472640 | elapsed time per iteration (s): 0.43 | learning rate: 6.860E-05 | global batch size: 256 | lm loss: 2.240819E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.793 | TFLOPs: 31.31 | 7: iteration 75540/ 115203 | consumed samples: 19338240 | consumed tokens: 39604715520 | elapsed time per iteration (s): 0.42 | learning rate: 6.858E-05 | global batch size: 256 | lm loss: 2.233169E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.454 | TFLOPs: 31.66 | 7: iteration 75550/ 115203 | consumed samples: 19340800 | consumed tokens: 39609958400 | elapsed time per iteration (s): 0.43 | learning rate: 6.856E-05 | global batch size: 256 | lm loss: 2.245732E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.812 | TFLOPs: 30.89 | 7: iteration 75560/ 115203 | consumed samples: 19343360 | consumed tokens: 39615201280 | elapsed time per iteration (s): 0.46 | learning rate: 6.854E-05 | global batch size: 256 | lm loss: 2.251740E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.705 | TFLOPs: 29.47 | 7: iteration 75570/ 115203 | consumed samples: 19345920 | consumed tokens: 39620444160 | elapsed time per iteration (s): 0.43 | learning rate: 6.851E-05 | global batch size: 256 | lm loss: 2.285612E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.013 | TFLOPs: 30.96 | 7: iteration 75580/ 115203 | consumed samples: 19348480 | consumed tokens: 39625687040 | elapsed time per iteration (s): 0.42 | learning rate: 6.849E-05 | global batch size: 256 | lm loss: 2.252953E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.857 | TFLOPs: 31.68 | 7: iteration 75590/ 115203 | consumed samples: 19351040 | consumed tokens: 39630929920 | elapsed time per iteration (s): 0.43 | learning rate: 6.847E-05 | global batch size: 256 | lm loss: 2.239116E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.734 | TFLOPs: 31.15 | 7: iteration 75600/ 115203 | consumed samples: 19353600 | consumed tokens: 39636172800 | elapsed time per iteration (s): 0.43 | learning rate: 6.845E-05 | global batch size: 256 | lm loss: 2.264296E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.460 | TFLOPs: 30.98 | 7: iteration 75610/ 115203 | consumed samples: 19356160 | consumed tokens: 39641415680 | elapsed time per iteration (s): 0.44 | learning rate: 6.843E-05 | global batch size: 256 | lm loss: 2.232656E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.322 | TFLOPs: 30.40 | 7: iteration 75620/ 115203 | consumed samples: 19358720 | consumed tokens: 39646658560 | elapsed time per iteration (s): 0.43 | learning rate: 6.840E-05 | global batch size: 256 | lm loss: 2.244174E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.590 | TFLOPs: 30.93 | 7: iteration 75630/ 115203 | consumed samples: 19361280 | consumed tokens: 39651901440 | elapsed time per iteration (s): 0.43 | learning rate: 6.838E-05 | global batch size: 256 | lm loss: 2.285027E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.531 | TFLOPs: 31.14 | 7: iteration 75640/ 115203 | consumed samples: 19363840 | consumed tokens: 39657144320 | elapsed time per iteration (s): 0.44 | learning rate: 6.836E-05 | global batch size: 256 | lm loss: 2.259573E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.966 | TFLOPs: 30.85 | 7: iteration 75650/ 115203 | consumed samples: 19366400 | consumed tokens: 39662387200 | elapsed time per iteration (s): 0.44 | learning rate: 6.834E-05 | global batch size: 256 | lm loss: 2.255611E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.514 | TFLOPs: 30.72 | 7: iteration 75660/ 115203 | consumed samples: 19368960 | consumed tokens: 39667630080 | elapsed time per iteration (s): 0.44 | learning rate: 6.832E-05 | global batch size: 256 | lm loss: 2.251960E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.693 | TFLOPs: 30.84 | 7: iteration 75670/ 115203 | consumed samples: 19371520 | consumed tokens: 39672872960 | elapsed time per iteration (s): 0.44 | learning rate: 6.829E-05 | global batch size: 256 | lm loss: 2.244745E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.474 | TFLOPs: 30.77 | 7: iteration 75680/ 115203 | consumed samples: 19374080 | consumed tokens: 39678115840 | elapsed time per iteration (s): 0.43 | learning rate: 6.827E-05 | global batch size: 256 | lm loss: 2.249340E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.866 | TFLOPs: 31.05 | 7: iteration 75690/ 115203 | consumed samples: 19376640 | consumed tokens: 39683358720 | elapsed time per iteration (s): 0.43 | learning rate: 6.825E-05 | global batch size: 256 | lm loss: 2.237417E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.878 | TFLOPs: 31.26 | 7: iteration 75700/ 115203 | consumed samples: 19379200 | consumed tokens: 39688601600 | elapsed time per iteration (s): 0.43 | learning rate: 6.823E-05 | global batch size: 256 | lm loss: 2.249786E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.885 | TFLOPs: 31.53 | 7: iteration 75710/ 115203 | consumed samples: 19381760 | consumed tokens: 39693844480 | elapsed time per iteration (s): 0.42 | learning rate: 6.821E-05 | global batch size: 256 | lm loss: 2.225499E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.194 | TFLOPs: 31.65 | 7: iteration 75720/ 115203 | consumed samples: 19384320 | consumed tokens: 39699087360 | elapsed time per iteration (s): 0.43 | learning rate: 6.818E-05 | global batch size: 256 | lm loss: 2.232308E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.104 | TFLOPs: 31.49 | 7: iteration 75730/ 115203 | consumed samples: 19386880 | consumed tokens: 39704330240 | elapsed time per iteration (s): 0.42 | learning rate: 6.816E-05 | global batch size: 256 | lm loss: 2.256833E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.141 | TFLOPs: 31.65 | 7: iteration 75740/ 115203 | consumed samples: 19389440 | consumed tokens: 39709573120 | elapsed time per iteration (s): 0.43 | learning rate: 6.814E-05 | global batch size: 256 | lm loss: 2.220923E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.563 | TFLOPs: 31.25 | 7: iteration 75750/ 115203 | consumed samples: 19392000 | consumed tokens: 39714816000 | elapsed time per iteration (s): 0.43 | learning rate: 6.812E-05 | global batch size: 256 | lm loss: 2.259019E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.575 | TFLOPs: 30.88 | 7: iteration 75760/ 115203 | consumed samples: 19394560 | consumed tokens: 39720058880 | elapsed time per iteration (s): 0.42 | learning rate: 6.810E-05 | global batch size: 256 | lm loss: 2.240448E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.736 | TFLOPs: 31.68 | 7: iteration 75770/ 115203 | consumed samples: 19397120 | consumed tokens: 39725301760 | elapsed time per iteration (s): 0.43 | learning rate: 6.807E-05 | global batch size: 256 | lm loss: 2.281380E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.109 | TFLOPs: 31.38 | 7: iteration 75780/ 115203 | consumed samples: 19399680 | consumed tokens: 39730544640 | elapsed time per iteration (s): 0.43 | learning rate: 6.805E-05 | global batch size: 256 | lm loss: 2.251990E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.783 | TFLOPs: 31.47 | 7: iteration 75790/ 115203 | consumed samples: 19402240 | consumed tokens: 39735787520 | elapsed time per iteration (s): 0.43 | learning rate: 6.803E-05 | global batch size: 256 | lm loss: 2.223657E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.053 | TFLOPs: 31.27 | 7: iteration 75800/ 115203 | consumed samples: 19404800 | consumed tokens: 39741030400 | elapsed time per iteration (s): 0.43 | learning rate: 6.801E-05 | global batch size: 256 | lm loss: 2.238947E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.838 | TFLOPs: 31.58 | 7: iteration 75810/ 115203 | consumed samples: 19407360 | consumed tokens: 39746273280 | elapsed time per iteration (s): 0.43 | learning rate: 6.799E-05 | global batch size: 256 | lm loss: 2.234017E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.016 | TFLOPs: 31.06 | 7: iteration 75820/ 115203 | consumed samples: 19409920 | consumed tokens: 39751516160 | elapsed time per iteration (s): 0.43 | learning rate: 6.797E-05 | global batch size: 256 | lm loss: 2.261525E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.981 | TFLOPs: 31.32 | 7: iteration 75830/ 115203 | consumed samples: 19412480 | consumed tokens: 39756759040 | elapsed time per iteration (s): 0.43 | learning rate: 6.794E-05 | global batch size: 256 | lm loss: 2.269621E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.648 | TFLOPs: 31.10 | 7: iteration 75840/ 115203 | consumed samples: 19415040 | consumed tokens: 39762001920 | elapsed time per iteration (s): 0.43 | learning rate: 6.792E-05 | global batch size: 256 | lm loss: 2.266656E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.621 | TFLOPs: 31.15 | 7: iteration 75850/ 115203 | consumed samples: 19417600 | consumed tokens: 39767244800 | elapsed time per iteration (s): 0.43 | learning rate: 6.790E-05 | global batch size: 256 | lm loss: 2.237395E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.049 | TFLOPs: 31.38 | 7: iteration 75860/ 115203 | consumed samples: 19420160 | consumed tokens: 39772487680 | elapsed time per iteration (s): 0.42 | learning rate: 6.788E-05 | global batch size: 256 | lm loss: 2.264774E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.083 | TFLOPs: 31.91 | 7: iteration 75870/ 115203 | consumed samples: 19422720 | consumed tokens: 39777730560 | elapsed time per iteration (s): 0.43 | learning rate: 6.786E-05 | global batch size: 256 | lm loss: 2.254935E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.775 | TFLOPs: 31.47 | 7: iteration 75880/ 115203 | consumed samples: 19425280 | consumed tokens: 39782973440 | elapsed time per iteration (s): 0.43 | learning rate: 6.783E-05 | global batch size: 256 | lm loss: 2.254826E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.100 | TFLOPs: 31.28 | 7: iteration 75890/ 115203 | consumed samples: 19427840 | consumed tokens: 39788216320 | elapsed time per iteration (s): 0.43 | learning rate: 6.781E-05 | global batch size: 256 | lm loss: 2.268560E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.838 | TFLOPs: 31.52 | 7: iteration 75900/ 115203 | consumed samples: 19430400 | consumed tokens: 39793459200 | elapsed time per iteration (s): 0.42 | learning rate: 6.779E-05 | global batch size: 256 | lm loss: 2.224432E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.630 | TFLOPs: 31.72 | 7: iteration 75910/ 115203 | consumed samples: 19432960 | consumed tokens: 39798702080 | elapsed time per iteration (s): 0.44 | learning rate: 6.777E-05 | global batch size: 256 | lm loss: 2.260357E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.108 | TFLOPs: 30.75 | 7: iteration 75920/ 115203 | consumed samples: 19435520 | consumed tokens: 39803944960 | elapsed time per iteration (s): 0.43 | learning rate: 6.775E-05 | global batch size: 256 | lm loss: 2.244656E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.624 | TFLOPs: 31.57 | 7: iteration 75930/ 115203 | consumed samples: 19438080 | consumed tokens: 39809187840 | elapsed time per iteration (s): 0.44 | learning rate: 6.772E-05 | global batch size: 256 | lm loss: 2.232726E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.813 | TFLOPs: 30.84 | 7: iteration 75940/ 115203 | consumed samples: 19440640 | consumed tokens: 39814430720 | elapsed time per iteration (s): 0.43 | learning rate: 6.770E-05 | global batch size: 256 | lm loss: 2.215408E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.712 | TFLOPs: 31.31 | 7: iteration 75950/ 115203 | consumed samples: 19443200 | consumed tokens: 39819673600 | elapsed time per iteration (s): 0.43 | learning rate: 6.768E-05 | global batch size: 256 | lm loss: 2.240012E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.401 | TFLOPs: 30.98 | 7: iteration 75960/ 115203 | consumed samples: 19445760 | consumed tokens: 39824916480 | elapsed time per iteration (s): 0.44 | learning rate: 6.766E-05 | global batch size: 256 | lm loss: 2.269723E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.110 | TFLOPs: 30.44 | 7: iteration 75970/ 115203 | consumed samples: 19448320 | consumed tokens: 39830159360 | elapsed time per iteration (s): 0.43 | learning rate: 6.764E-05 | global batch size: 256 | lm loss: 2.271054E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.184 | TFLOPs: 31.28 | 7: iteration 75980/ 115203 | consumed samples: 19450880 | consumed tokens: 39835402240 | elapsed time per iteration (s): 0.44 | learning rate: 6.761E-05 | global batch size: 256 | lm loss: 2.278067E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.528 | TFLOPs: 30.83 | 7: iteration 75990/ 115203 | consumed samples: 19453440 | consumed tokens: 39840645120 | elapsed time per iteration (s): 0.43 | learning rate: 6.759E-05 | global batch size: 256 | lm loss: 2.243212E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.735 | TFLOPs: 31.52 | 0: [2022-11-28 22:06:22,685] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=0, lr=[6.757111507639708e-05, 6.757111507639708e-05, 6.757111507639708e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 76000/ 115203 | consumed samples: 19456000 | consumed tokens: 39845888000 | elapsed time per iteration (s): 0.43 | learning rate: 6.757E-05 | global batch size: 256 | lm loss: 2.253337E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.617 | TFLOPs: 31.57 | 0: steps: 76000 loss: 2.1428 iter time (s): 0.433 samples/sec: 591.058 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 76000 | lm loss value: 2.097518E+00 | lm loss PPL: 8.145926E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 76000 to checkpoints_221m 0: [2022-11-28 22:06:22,877] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step76000 is begin to save! 0: [2022-11-28 22:06:22,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:06:23,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:06:23,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:06:23,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:06:23,039] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:06:23,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:06:23,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:06:23,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:06:23,088] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:06:23,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:06:23,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:06:23,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:06:23,135] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:06:23,158] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:06:23,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:06:23,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:06:23,182] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:06:23,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:06:23,205] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:06:23,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:06:23,229] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:06:23,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:06:23,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:06:23,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:06:23,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:06:23,301] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:06:23,301] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:06:23,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:06:23,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:06:23,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:06:23,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:06:23,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:06:23,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:06:23,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:06:23,397] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:06:23,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:06:23,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:06:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:06:23,446] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:06:23,451] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:06:23,452] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step76000/mp_rank_00_model_states.pt 0: [2022-11-28 22:06:23,452] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:06:23,454] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:06:23,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:06:23,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,526] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,526] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,527] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,529] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,529] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,529] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,530] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,530] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,530] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 7: [2022-11-28 22:06:23,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:06:23,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:06:23,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 2: [2022-11-28 22:06:23,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:06:23,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:06:23,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:06:23,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 6: [2022-11-28 22:06:23,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,551] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 4: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:06:23,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:06:23,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 3: [2022-11-28 22:06:23,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:06:23,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 22:06:23,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:06:23,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 5: [2022-11-28 22:06:23,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: [2022-11-28 22:06:23,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:06:23,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 1: [2022-11-28 22:06:23,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:06:23,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step76000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:06:23,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step76000 is ready now! 0: successfully saved checkpoint at iteration 76000 to checkpoints_221m 7: time (ms) | save-checkpoint: 784.01 7: iteration 76010/ 115203 | consumed samples: 19458560 | consumed tokens: 39851130880 | elapsed time per iteration (s): 0.53 | learning rate: 6.755E-05 | global batch size: 256 | lm loss: 2.230087E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 479.768 | TFLOPs: 25.17 | 7: iteration 76020/ 115203 | consumed samples: 19461120 | consumed tokens: 39856373760 | elapsed time per iteration (s): 0.43 | learning rate: 6.753E-05 | global batch size: 256 | lm loss: 2.246687E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.549 | TFLOPs: 31.19 | 7: iteration 76030/ 115203 | consumed samples: 19463680 | consumed tokens: 39861616640 | elapsed time per iteration (s): 0.42 | learning rate: 6.751E-05 | global batch size: 256 | lm loss: 2.256931E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.954 | TFLOPs: 31.69 | 7: iteration 76040/ 115203 | consumed samples: 19466240 | consumed tokens: 39866859520 | elapsed time per iteration (s): 0.43 | learning rate: 6.748E-05 | global batch size: 256 | lm loss: 2.241738E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.645 | TFLOPs: 31.20 | 7: iteration 76050/ 115203 | consumed samples: 19468800 | consumed tokens: 39872102400 | elapsed time per iteration (s): 0.43 | learning rate: 6.746E-05 | global batch size: 256 | lm loss: 2.273234E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.531 | TFLOPs: 31.40 | 7: iteration 76060/ 115203 | consumed samples: 19471360 | consumed tokens: 39877345280 | elapsed time per iteration (s): 0.43 | learning rate: 6.744E-05 | global batch size: 256 | lm loss: 2.258584E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.933 | TFLOPs: 31.27 | 7: iteration 76070/ 115203 | consumed samples: 19473920 | consumed tokens: 39882588160 | elapsed time per iteration (s): 0.43 | learning rate: 6.742E-05 | global batch size: 256 | lm loss: 2.236967E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.719 | TFLOPs: 31.10 | 7: iteration 76080/ 115203 | consumed samples: 19476480 | consumed tokens: 39887831040 | elapsed time per iteration (s): 0.43 | learning rate: 6.740E-05 | global batch size: 256 | lm loss: 2.265981E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.003 | TFLOPs: 31.27 | 7: iteration 76090/ 115203 | consumed samples: 19479040 | consumed tokens: 39893073920 | elapsed time per iteration (s): 0.44 | learning rate: 6.737E-05 | global batch size: 256 | lm loss: 2.244135E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.945 | TFLOPs: 30.32 | 7: iteration 76100/ 115203 | consumed samples: 19481600 | consumed tokens: 39898316800 | elapsed time per iteration (s): 0.42 | learning rate: 6.735E-05 | global batch size: 256 | lm loss: 2.275015E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.539 | TFLOPs: 31.61 | 7: iteration 76110/ 115203 | consumed samples: 19484160 | consumed tokens: 39903559680 | elapsed time per iteration (s): 0.42 | learning rate: 6.733E-05 | global batch size: 256 | lm loss: 2.258691E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.378 | TFLOPs: 31.66 | 7: iteration 76120/ 115203 | consumed samples: 19486720 | consumed tokens: 39908802560 | elapsed time per iteration (s): 0.46 | learning rate: 6.731E-05 | global batch size: 256 | lm loss: 2.235388E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.010 | TFLOPs: 29.28 | 7: iteration 76130/ 115203 | consumed samples: 19489280 | consumed tokens: 39914045440 | elapsed time per iteration (s): 0.44 | learning rate: 6.729E-05 | global batch size: 256 | lm loss: 2.256340E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.019 | TFLOPs: 30.43 | 7: iteration 76140/ 115203 | consumed samples: 19491840 | consumed tokens: 39919288320 | elapsed time per iteration (s): 0.42 | learning rate: 6.727E-05 | global batch size: 256 | lm loss: 2.267932E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.078 | TFLOPs: 31.64 | 7: iteration 76150/ 115203 | consumed samples: 19494400 | consumed tokens: 39924531200 | elapsed time per iteration (s): 0.44 | learning rate: 6.724E-05 | global batch size: 256 | lm loss: 2.276018E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.163 | TFLOPs: 30.76 | 7: iteration 76160/ 115203 | consumed samples: 19496960 | consumed tokens: 39929774080 | elapsed time per iteration (s): 0.44 | learning rate: 6.722E-05 | global batch size: 256 | lm loss: 2.274190E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.908 | TFLOPs: 30.85 | 7: iteration 76170/ 115203 | consumed samples: 19499520 | consumed tokens: 39935016960 | elapsed time per iteration (s): 0.44 | learning rate: 6.720E-05 | global batch size: 256 | lm loss: 2.245735E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.417 | TFLOPs: 30.87 | 7: iteration 76180/ 115203 | consumed samples: 19502080 | consumed tokens: 39940259840 | elapsed time per iteration (s): 0.43 | learning rate: 6.718E-05 | global batch size: 256 | lm loss: 2.266329E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.982 | TFLOPs: 30.96 | 7: iteration 76190/ 115203 | consumed samples: 19504640 | consumed tokens: 39945502720 | elapsed time per iteration (s): 0.44 | learning rate: 6.716E-05 | global batch size: 256 | lm loss: 2.250117E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.943 | TFLOPs: 30.53 | 7: iteration 76200/ 115203 | consumed samples: 19507200 | consumed tokens: 39950745600 | elapsed time per iteration (s): 0.43 | learning rate: 6.713E-05 | global batch size: 256 | lm loss: 2.259405E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.883 | TFLOPs: 31.58 | 7: iteration 76210/ 115203 | consumed samples: 19509760 | consumed tokens: 39955988480 | elapsed time per iteration (s): 0.43 | learning rate: 6.711E-05 | global batch size: 256 | lm loss: 2.234269E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.494 | TFLOPs: 31.14 | 7: iteration 76220/ 115203 | consumed samples: 19512320 | consumed tokens: 39961231360 | elapsed time per iteration (s): 0.43 | learning rate: 6.709E-05 | global batch size: 256 | lm loss: 2.254044E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.932 | TFLOPs: 31.37 | 7: iteration 76230/ 115203 | consumed samples: 19514880 | consumed tokens: 39966474240 | elapsed time per iteration (s): 0.43 | learning rate: 6.707E-05 | global batch size: 256 | lm loss: 2.287610E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.082 | TFLOPs: 31.01 | 7: iteration 76240/ 115203 | consumed samples: 19517440 | consumed tokens: 39971717120 | elapsed time per iteration (s): 0.43 | learning rate: 6.705E-05 | global batch size: 256 | lm loss: 2.235624E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.893 | TFLOPs: 31.42 | 7: iteration 76250/ 115203 | consumed samples: 19520000 | consumed tokens: 39976960000 | elapsed time per iteration (s): 0.42 | learning rate: 6.703E-05 | global batch size: 256 | lm loss: 2.222467E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.441 | TFLOPs: 31.66 | 7: iteration 76260/ 115203 | consumed samples: 19522560 | consumed tokens: 39982202880 | elapsed time per iteration (s): 0.44 | learning rate: 6.700E-05 | global batch size: 256 | lm loss: 2.255565E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.285 | TFLOPs: 30.45 | 7: iteration 76270/ 115203 | consumed samples: 19525120 | consumed tokens: 39987445760 | elapsed time per iteration (s): 0.43 | learning rate: 6.698E-05 | global batch size: 256 | lm loss: 2.259392E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.386 | TFLOPs: 31.29 | 7: iteration 76280/ 115203 | consumed samples: 19527680 | consumed tokens: 39992688640 | elapsed time per iteration (s): 0.44 | learning rate: 6.696E-05 | global batch size: 256 | lm loss: 2.225774E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.793 | TFLOPs: 30.68 | 7: iteration 76290/ 115203 | consumed samples: 19530240 | consumed tokens: 39997931520 | elapsed time per iteration (s): 0.43 | learning rate: 6.694E-05 | global batch size: 256 | lm loss: 2.242580E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.169 | TFLOPs: 30.91 | 7: iteration 76300/ 115203 | consumed samples: 19532800 | consumed tokens: 40003174400 | elapsed time per iteration (s): 0.44 | learning rate: 6.692E-05 | global batch size: 256 | lm loss: 2.244460E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.065 | TFLOPs: 30.70 | 7: iteration 76310/ 115203 | consumed samples: 19535360 | consumed tokens: 40008417280 | elapsed time per iteration (s): 0.42 | learning rate: 6.689E-05 | global batch size: 256 | lm loss: 2.225391E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.100 | TFLOPs: 31.70 | 7: iteration 76320/ 115203 | consumed samples: 19537920 | consumed tokens: 40013660160 | elapsed time per iteration (s): 0.44 | learning rate: 6.687E-05 | global batch size: 256 | lm loss: 2.254582E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.189 | TFLOPs: 30.60 | 7: iteration 76330/ 115203 | consumed samples: 19540480 | consumed tokens: 40018903040 | elapsed time per iteration (s): 0.43 | learning rate: 6.685E-05 | global batch size: 256 | lm loss: 2.292845E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.918 | TFLOPs: 31.06 | 7: iteration 76340/ 115203 | consumed samples: 19543040 | consumed tokens: 40024145920 | elapsed time per iteration (s): 0.45 | learning rate: 6.683E-05 | global batch size: 256 | lm loss: 2.294477E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.325 | TFLOPs: 29.92 | 7: iteration 76350/ 115203 | consumed samples: 19545600 | consumed tokens: 40029388800 | elapsed time per iteration (s): 0.44 | learning rate: 6.681E-05 | global batch size: 256 | lm loss: 2.277668E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.120 | TFLOPs: 30.86 | 7: iteration 76360/ 115203 | consumed samples: 19548160 | consumed tokens: 40034631680 | elapsed time per iteration (s): 0.43 | learning rate: 6.679E-05 | global batch size: 256 | lm loss: 2.238460E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.177 | TFLOPs: 31.12 | 7: iteration 76370/ 115203 | consumed samples: 19550720 | consumed tokens: 40039874560 | elapsed time per iteration (s): 0.43 | learning rate: 6.676E-05 | global batch size: 256 | lm loss: 2.255088E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.027 | TFLOPs: 31.59 | 7: iteration 76380/ 115203 | consumed samples: 19553280 | consumed tokens: 40045117440 | elapsed time per iteration (s): 0.44 | learning rate: 6.674E-05 | global batch size: 256 | lm loss: 2.233622E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.780 | TFLOPs: 30.79 | 7: iteration 76390/ 115203 | consumed samples: 19555840 | consumed tokens: 40050360320 | elapsed time per iteration (s): 0.43 | learning rate: 6.672E-05 | global batch size: 256 | lm loss: 2.238018E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.050 | TFLOPs: 31.33 | 7: iteration 76400/ 115203 | consumed samples: 19558400 | consumed tokens: 40055603200 | elapsed time per iteration (s): 0.44 | learning rate: 6.670E-05 | global batch size: 256 | lm loss: 2.249302E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.193 | TFLOPs: 30.60 | 7: iteration 76410/ 115203 | consumed samples: 19560960 | consumed tokens: 40060846080 | elapsed time per iteration (s): 0.43 | learning rate: 6.668E-05 | global batch size: 256 | lm loss: 2.277126E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.091 | TFLOPs: 31.49 | 7: iteration 76420/ 115203 | consumed samples: 19563520 | consumed tokens: 40066088960 | elapsed time per iteration (s): 0.44 | learning rate: 6.666E-05 | global batch size: 256 | lm loss: 2.257661E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.338 | TFLOPs: 30.82 | 7: iteration 76430/ 115203 | consumed samples: 19566080 | consumed tokens: 40071331840 | elapsed time per iteration (s): 0.42 | learning rate: 6.663E-05 | global batch size: 256 | lm loss: 2.263806E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.362 | TFLOPs: 31.81 | 7: iteration 76440/ 115203 | consumed samples: 19568640 | consumed tokens: 40076574720 | elapsed time per iteration (s): 0.43 | learning rate: 6.661E-05 | global batch size: 256 | lm loss: 2.231919E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.426 | TFLOPs: 31.19 | 7: iteration 76450/ 115203 | consumed samples: 19571200 | consumed tokens: 40081817600 | elapsed time per iteration (s): 0.43 | learning rate: 6.659E-05 | global batch size: 256 | lm loss: 2.251884E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.998 | TFLOPs: 31.32 | 7: iteration 76460/ 115203 | consumed samples: 19573760 | consumed tokens: 40087060480 | elapsed time per iteration (s): 0.43 | learning rate: 6.657E-05 | global batch size: 256 | lm loss: 2.248046E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.168 | TFLOPs: 31.18 | 7: iteration 76470/ 115203 | consumed samples: 19576320 | consumed tokens: 40092303360 | elapsed time per iteration (s): 0.42 | learning rate: 6.655E-05 | global batch size: 256 | lm loss: 2.268577E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.085 | TFLOPs: 31.80 | 7: iteration 76480/ 115203 | consumed samples: 19578880 | consumed tokens: 40097546240 | elapsed time per iteration (s): 0.44 | learning rate: 6.653E-05 | global batch size: 256 | lm loss: 2.244559E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.732 | TFLOPs: 30.52 | 7: iteration 76490/ 115203 | consumed samples: 19581440 | consumed tokens: 40102789120 | elapsed time per iteration (s): 0.42 | learning rate: 6.650E-05 | global batch size: 256 | lm loss: 2.234230E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.585 | TFLOPs: 31.72 | 7: iteration 76500/ 115203 | consumed samples: 19584000 | consumed tokens: 40108032000 | elapsed time per iteration (s): 0.43 | learning rate: 6.648E-05 | global batch size: 256 | lm loss: 2.214296E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.477 | TFLOPs: 31.24 | 7: iteration 76510/ 115203 | consumed samples: 19586560 | consumed tokens: 40113274880 | elapsed time per iteration (s): 0.43 | learning rate: 6.646E-05 | global batch size: 256 | lm loss: 2.242179E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.086 | TFLOPs: 31.59 | 7: iteration 76520/ 115203 | consumed samples: 19589120 | consumed tokens: 40118517760 | elapsed time per iteration (s): 0.43 | learning rate: 6.644E-05 | global batch size: 256 | lm loss: 2.258945E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.934 | TFLOPs: 31.11 | 7: iteration 76530/ 115203 | consumed samples: 19591680 | consumed tokens: 40123760640 | elapsed time per iteration (s): 0.43 | learning rate: 6.642E-05 | global batch size: 256 | lm loss: 2.262852E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.613 | TFLOPs: 31.04 | 7: iteration 76540/ 115203 | consumed samples: 19594240 | consumed tokens: 40129003520 | elapsed time per iteration (s): 0.43 | learning rate: 6.640E-05 | global batch size: 256 | lm loss: 2.228829E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.862 | TFLOPs: 31.37 | 7: iteration 76550/ 115203 | consumed samples: 19596800 | consumed tokens: 40134246400 | elapsed time per iteration (s): 0.44 | learning rate: 6.637E-05 | global batch size: 256 | lm loss: 2.270471E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.271 | TFLOPs: 30.71 | 7: iteration 76560/ 115203 | consumed samples: 19599360 | consumed tokens: 40139489280 | elapsed time per iteration (s): 0.44 | learning rate: 6.635E-05 | global batch size: 256 | lm loss: 2.265084E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.345 | TFLOPs: 30.19 | 7: iteration 76570/ 115203 | consumed samples: 19601920 | consumed tokens: 40144732160 | elapsed time per iteration (s): 0.43 | learning rate: 6.633E-05 | global batch size: 256 | lm loss: 2.278557E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.556 | TFLOPs: 31.09 | 7: iteration 76580/ 115203 | consumed samples: 19604480 | consumed tokens: 40149975040 | elapsed time per iteration (s): 0.43 | learning rate: 6.631E-05 | global batch size: 256 | lm loss: 2.250428E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.398 | TFLOPs: 31.34 | 7: iteration 76590/ 115203 | consumed samples: 19607040 | consumed tokens: 40155217920 | elapsed time per iteration (s): 0.43 | learning rate: 6.629E-05 | global batch size: 256 | lm loss: 2.250267E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.736 | TFLOPs: 31.05 | 7: iteration 76600/ 115203 | consumed samples: 19609600 | consumed tokens: 40160460800 | elapsed time per iteration (s): 0.43 | learning rate: 6.627E-05 | global batch size: 256 | lm loss: 2.223116E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.628 | TFLOPs: 31.51 | 7: iteration 76610/ 115203 | consumed samples: 19612160 | consumed tokens: 40165703680 | elapsed time per iteration (s): 0.44 | learning rate: 6.624E-05 | global batch size: 256 | lm loss: 2.244026E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.077 | TFLOPs: 30.23 | 7: iteration 76620/ 115203 | consumed samples: 19614720 | consumed tokens: 40170946560 | elapsed time per iteration (s): 0.43 | learning rate: 6.622E-05 | global batch size: 256 | lm loss: 2.249491E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.158 | TFLOPs: 30.91 | 7: iteration 76630/ 115203 | consumed samples: 19617280 | consumed tokens: 40176189440 | elapsed time per iteration (s): 0.43 | learning rate: 6.620E-05 | global batch size: 256 | lm loss: 2.270824E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.512 | TFLOPs: 31.25 | 7: iteration 76640/ 115203 | consumed samples: 19619840 | consumed tokens: 40181432320 | elapsed time per iteration (s): 0.42 | learning rate: 6.618E-05 | global batch size: 256 | lm loss: 2.279741E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.600 | TFLOPs: 31.88 | 7: iteration 76650/ 115203 | consumed samples: 19622400 | consumed tokens: 40186675200 | elapsed time per iteration (s): 0.43 | learning rate: 6.616E-05 | global batch size: 256 | lm loss: 2.246791E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.817 | TFLOPs: 31.42 | 7: iteration 76660/ 115203 | consumed samples: 19624960 | consumed tokens: 40191918080 | elapsed time per iteration (s): 0.45 | learning rate: 6.614E-05 | global batch size: 256 | lm loss: 2.247661E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.129 | TFLOPs: 30.18 | 7: iteration 76670/ 115203 | consumed samples: 19627520 | consumed tokens: 40197160960 | elapsed time per iteration (s): 0.42 | learning rate: 6.611E-05 | global batch size: 256 | lm loss: 2.236839E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.048 | TFLOPs: 31.80 | 7: iteration 76680/ 115203 | consumed samples: 19630080 | consumed tokens: 40202403840 | elapsed time per iteration (s): 0.43 | learning rate: 6.609E-05 | global batch size: 256 | lm loss: 2.238636E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.088 | TFLOPs: 31.12 | 7: iteration 76690/ 115203 | consumed samples: 19632640 | consumed tokens: 40207646720 | elapsed time per iteration (s): 0.44 | learning rate: 6.607E-05 | global batch size: 256 | lm loss: 2.251203E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.108 | TFLOPs: 30.59 | 7: iteration 76700/ 115203 | consumed samples: 19635200 | consumed tokens: 40212889600 | elapsed time per iteration (s): 0.42 | learning rate: 6.605E-05 | global batch size: 256 | lm loss: 2.234743E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.407 | TFLOPs: 31.61 | 7: iteration 76710/ 115203 | consumed samples: 19637760 | consumed tokens: 40218132480 | elapsed time per iteration (s): 0.43 | learning rate: 6.603E-05 | global batch size: 256 | lm loss: 2.222957E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.464 | TFLOPs: 31.30 | 7: iteration 76720/ 115203 | consumed samples: 19640320 | consumed tokens: 40223375360 | elapsed time per iteration (s): 0.43 | learning rate: 6.601E-05 | global batch size: 256 | lm loss: 2.237167E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.228 | TFLOPs: 31.55 | 7: iteration 76730/ 115203 | consumed samples: 19642880 | consumed tokens: 40228618240 | elapsed time per iteration (s): 0.43 | learning rate: 6.598E-05 | global batch size: 256 | lm loss: 2.255987E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.633 | TFLOPs: 30.94 | 7: iteration 76740/ 115203 | consumed samples: 19645440 | consumed tokens: 40233861120 | elapsed time per iteration (s): 0.43 | learning rate: 6.596E-05 | global batch size: 256 | lm loss: 2.264207E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.993 | TFLOPs: 31.17 | 7: iteration 76750/ 115203 | consumed samples: 19648000 | consumed tokens: 40239104000 | elapsed time per iteration (s): 0.43 | learning rate: 6.594E-05 | global batch size: 256 | lm loss: 2.265361E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.236 | TFLOPs: 31.13 | 7: iteration 76760/ 115203 | consumed samples: 19650560 | consumed tokens: 40244346880 | elapsed time per iteration (s): 0.44 | learning rate: 6.592E-05 | global batch size: 256 | lm loss: 2.252379E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.667 | TFLOPs: 30.73 | 7: iteration 76770/ 115203 | consumed samples: 19653120 | consumed tokens: 40249589760 | elapsed time per iteration (s): 0.42 | learning rate: 6.590E-05 | global batch size: 256 | lm loss: 2.250872E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.260 | TFLOPs: 31.91 | 7: iteration 76780/ 115203 | consumed samples: 19655680 | consumed tokens: 40254832640 | elapsed time per iteration (s): 0.43 | learning rate: 6.588E-05 | global batch size: 256 | lm loss: 2.230480E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.900 | TFLOPs: 31.42 | 7: iteration 76790/ 115203 | consumed samples: 19658240 | consumed tokens: 40260075520 | elapsed time per iteration (s): 0.42 | learning rate: 6.585E-05 | global batch size: 256 | lm loss: 2.239667E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.198 | TFLOPs: 31.75 | 7: iteration 76800/ 115203 | consumed samples: 19660800 | consumed tokens: 40265318400 | elapsed time per iteration (s): 0.43 | learning rate: 6.583E-05 | global batch size: 256 | lm loss: 2.235285E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.876 | TFLOPs: 31.32 | 7: iteration 76810/ 115203 | consumed samples: 19663360 | consumed tokens: 40270561280 | elapsed time per iteration (s): 0.44 | learning rate: 6.581E-05 | global batch size: 256 | lm loss: 2.243501E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.984 | TFLOPs: 30.43 | 7: iteration 76820/ 115203 | consumed samples: 19665920 | consumed tokens: 40275804160 | elapsed time per iteration (s): 0.44 | learning rate: 6.579E-05 | global batch size: 256 | lm loss: 2.244756E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.743 | TFLOPs: 30.79 | 7: iteration 76830/ 115203 | consumed samples: 19668480 | consumed tokens: 40281047040 | elapsed time per iteration (s): 0.44 | learning rate: 6.577E-05 | global batch size: 256 | lm loss: 2.282874E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.314 | TFLOPs: 30.55 | 7: iteration 76840/ 115203 | consumed samples: 19671040 | consumed tokens: 40286289920 | elapsed time per iteration (s): 0.43 | learning rate: 6.575E-05 | global batch size: 256 | lm loss: 2.240810E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.956 | TFLOPs: 31.53 | 7: iteration 76850/ 115203 | consumed samples: 19673600 | consumed tokens: 40291532800 | elapsed time per iteration (s): 0.43 | learning rate: 6.572E-05 | global batch size: 256 | lm loss: 2.267384E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.237 | TFLOPs: 31.60 | 7: iteration 76860/ 115203 | consumed samples: 19676160 | consumed tokens: 40296775680 | elapsed time per iteration (s): 0.43 | learning rate: 6.570E-05 | global batch size: 256 | lm loss: 2.247691E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.646 | TFLOPs: 31.15 | 7: iteration 76870/ 115203 | consumed samples: 19678720 | consumed tokens: 40302018560 | elapsed time per iteration (s): 0.42 | learning rate: 6.568E-05 | global batch size: 256 | lm loss: 2.276114E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.569 | TFLOPs: 31.67 | 7: iteration 76880/ 115203 | consumed samples: 19681280 | consumed tokens: 40307261440 | elapsed time per iteration (s): 0.43 | learning rate: 6.566E-05 | global batch size: 256 | lm loss: 2.256005E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.286 | TFLOPs: 31.08 | 7: iteration 76890/ 115203 | consumed samples: 19683840 | consumed tokens: 40312504320 | elapsed time per iteration (s): 0.43 | learning rate: 6.564E-05 | global batch size: 256 | lm loss: 2.259713E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.796 | TFLOPs: 31.58 | 7: iteration 76900/ 115203 | consumed samples: 19686400 | consumed tokens: 40317747200 | elapsed time per iteration (s): 0.43 | learning rate: 6.562E-05 | global batch size: 256 | lm loss: 2.220633E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.147 | TFLOPs: 31.12 | 7: iteration 76910/ 115203 | consumed samples: 19688960 | consumed tokens: 40322990080 | elapsed time per iteration (s): 0.42 | learning rate: 6.560E-05 | global batch size: 256 | lm loss: 2.264840E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.357 | TFLOPs: 32.02 | 7: iteration 76920/ 115203 | consumed samples: 19691520 | consumed tokens: 40328232960 | elapsed time per iteration (s): 0.43 | learning rate: 6.557E-05 | global batch size: 256 | lm loss: 2.275217E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.932 | TFLOPs: 30.90 | 7: iteration 76930/ 115203 | consumed samples: 19694080 | consumed tokens: 40333475840 | elapsed time per iteration (s): 0.43 | learning rate: 6.555E-05 | global batch size: 256 | lm loss: 2.288356E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.026 | TFLOPs: 31.59 | 7: iteration 76940/ 115203 | consumed samples: 19696640 | consumed tokens: 40338718720 | elapsed time per iteration (s): 0.43 | learning rate: 6.553E-05 | global batch size: 256 | lm loss: 2.245611E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.710 | TFLOPs: 31.36 | 7: iteration 76950/ 115203 | consumed samples: 19699200 | consumed tokens: 40343961600 | elapsed time per iteration (s): 0.43 | learning rate: 6.551E-05 | global batch size: 256 | lm loss: 2.256095E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.099 | TFLOPs: 31.22 | 7: iteration 76960/ 115203 | consumed samples: 19701760 | consumed tokens: 40349204480 | elapsed time per iteration (s): 0.45 | learning rate: 6.549E-05 | global batch size: 256 | lm loss: 2.247528E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.280 | TFLOPs: 29.87 | 7: iteration 76970/ 115203 | consumed samples: 19704320 | consumed tokens: 40354447360 | elapsed time per iteration (s): 0.43 | learning rate: 6.547E-05 | global batch size: 256 | lm loss: 2.244191E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.851 | TFLOPs: 31.32 | 7: iteration 76980/ 115203 | consumed samples: 19706880 | consumed tokens: 40359690240 | elapsed time per iteration (s): 0.42 | learning rate: 6.544E-05 | global batch size: 256 | lm loss: 2.253997E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.075 | TFLOPs: 31.64 | 7: iteration 76990/ 115203 | consumed samples: 19709440 | consumed tokens: 40364933120 | elapsed time per iteration (s): 0.43 | learning rate: 6.542E-05 | global batch size: 256 | lm loss: 2.241523E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.076 | TFLOPs: 31.54 | 7: iteration 77000/ 115203 | consumed samples: 19712000 | consumed tokens: 40370176000 | elapsed time per iteration (s): 0.43 | learning rate: 6.540E-05 | global batch size: 256 | lm loss: 2.274653E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.731 | TFLOPs: 31.31 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 77000 | lm loss value: 2.153460E+00 | lm loss PPL: 8.614610E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 77000 to checkpoints_221m 0: [2022-11-28 22:13:35,384] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step77000 is begin to save! 0: [2022-11-28 22:13:35,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:13:35,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:13:35,533] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:13:35,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:13:35,554] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:13:35,577] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:13:35,577] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:13:35,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:13:35,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:13:35,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:13:35,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:13:35,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:13:35,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:13:35,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:13:35,670] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:13:35,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:13:35,694] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:13:35,717] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:13:35,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:13:35,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:13:35,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:13:35,763] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:13:35,764] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:13:35,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:13:35,786] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:13:35,809] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:13:35,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:13:35,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:13:35,832] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:13:35,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:13:35,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:13:35,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:13:35,879] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:13:35,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:13:35,903] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:13:35,928] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:13:35,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:13:35,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:13:35,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:13:35,955] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:13:35,956] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step77000/mp_rank_00_model_states.pt 0: [2022-11-28 22:13:35,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:13:35,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:13:35,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:13:35,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:13:36,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,027] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,027] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,027] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,028] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,028] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 4: [2022-11-28 22:13:36,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:13:36,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:13:36,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,040] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,041] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,041] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 7: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:13:36,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:13:36,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 2: [2022-11-28 22:13:36,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,037] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,037] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:13:36,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 5: [2022-11-28 22:13:36,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 3: [2022-11-28 22:13:36,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 6: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 3: [2022-11-28 22:13:36,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,050] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,054] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:13:36,054] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,054] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: [2022-11-28 22:13:36,091] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:13:36,091] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:13:36,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step77000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 1: [2022-11-28 22:13:36,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step77000 is ready now! 0: successfully saved checkpoint at iteration 77000 to checkpoints_221m 7: time (ms) | save-checkpoint: 740.40 7: iteration 77010/ 115203 | consumed samples: 19714560 | consumed tokens: 40375418880 | elapsed time per iteration (s): 0.51 | learning rate: 6.538E-05 | global batch size: 256 | lm loss: 2.212881E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.703 | TFLOPs: 26.22 | 7: iteration 77020/ 115203 | consumed samples: 19717120 | consumed tokens: 40380661760 | elapsed time per iteration (s): 0.42 | learning rate: 6.536E-05 | global batch size: 256 | lm loss: 2.269616E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.057 | TFLOPs: 31.85 | 7: iteration 77030/ 115203 | consumed samples: 19719680 | consumed tokens: 40385904640 | elapsed time per iteration (s): 0.42 | learning rate: 6.534E-05 | global batch size: 256 | lm loss: 2.239911E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.500 | TFLOPs: 31.72 | 7: iteration 77040/ 115203 | consumed samples: 19722240 | consumed tokens: 40391147520 | elapsed time per iteration (s): 0.43 | learning rate: 6.532E-05 | global batch size: 256 | lm loss: 2.262107E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.363 | TFLOPs: 31.03 | 7: iteration 77050/ 115203 | consumed samples: 19724800 | consumed tokens: 40396390400 | elapsed time per iteration (s): 0.42 | learning rate: 6.529E-05 | global batch size: 256 | lm loss: 2.247854E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.324 | TFLOPs: 31.66 | 7: iteration 77060/ 115203 | consumed samples: 19727360 | consumed tokens: 40401633280 | elapsed time per iteration (s): 0.43 | learning rate: 6.527E-05 | global batch size: 256 | lm loss: 2.255847E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.444 | TFLOPs: 31.08 | 7: iteration 77070/ 115203 | consumed samples: 19729920 | consumed tokens: 40406876160 | elapsed time per iteration (s): 0.42 | learning rate: 6.525E-05 | global batch size: 256 | lm loss: 2.234279E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.636 | TFLOPs: 31.62 | 7: iteration 77080/ 115203 | consumed samples: 19732480 | consumed tokens: 40412119040 | elapsed time per iteration (s): 0.43 | learning rate: 6.523E-05 | global batch size: 256 | lm loss: 2.260961E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.318 | TFLOPs: 31.13 | 7: iteration 77090/ 115203 | consumed samples: 19735040 | consumed tokens: 40417361920 | elapsed time per iteration (s): 0.43 | learning rate: 6.521E-05 | global batch size: 256 | lm loss: 2.247066E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.273 | TFLOPs: 31.39 | 7: iteration 77100/ 115203 | consumed samples: 19737600 | consumed tokens: 40422604800 | elapsed time per iteration (s): 0.42 | learning rate: 6.519E-05 | global batch size: 256 | lm loss: 2.258124E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.768 | TFLOPs: 31.94 | 7: iteration 77110/ 115203 | consumed samples: 19740160 | consumed tokens: 40427847680 | elapsed time per iteration (s): 0.42 | learning rate: 6.516E-05 | global batch size: 256 | lm loss: 2.226078E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.731 | TFLOPs: 31.78 | 7: iteration 77120/ 115203 | consumed samples: 19742720 | consumed tokens: 40433090560 | elapsed time per iteration (s): 0.43 | learning rate: 6.514E-05 | global batch size: 256 | lm loss: 2.292183E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.734 | TFLOPs: 31.41 | 7: iteration 77130/ 115203 | consumed samples: 19745280 | consumed tokens: 40438333440 | elapsed time per iteration (s): 0.43 | learning rate: 6.512E-05 | global batch size: 256 | lm loss: 2.240007E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.470 | TFLOPs: 31.09 | 7: iteration 77140/ 115203 | consumed samples: 19747840 | consumed tokens: 40443576320 | elapsed time per iteration (s): 0.43 | learning rate: 6.510E-05 | global batch size: 256 | lm loss: 2.256547E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.056 | TFLOPs: 31.48 | 7: iteration 77150/ 115203 | consumed samples: 19750400 | consumed tokens: 40448819200 | elapsed time per iteration (s): 0.43 | learning rate: 6.508E-05 | global batch size: 256 | lm loss: 2.238112E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.034 | TFLOPs: 31.33 | 7: iteration 77160/ 115203 | consumed samples: 19752960 | consumed tokens: 40454062080 | elapsed time per iteration (s): 0.45 | learning rate: 6.506E-05 | global batch size: 256 | lm loss: 2.251820E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.385 | TFLOPs: 30.14 | 7: iteration 77170/ 115203 | consumed samples: 19755520 | consumed tokens: 40459304960 | elapsed time per iteration (s): 0.43 | learning rate: 6.504E-05 | global batch size: 256 | lm loss: 2.254110E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.138 | TFLOPs: 31.17 | 7: iteration 77180/ 115203 | consumed samples: 19758080 | consumed tokens: 40464547840 | elapsed time per iteration (s): 0.45 | learning rate: 6.501E-05 | global batch size: 256 | lm loss: 2.240807E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.468 | TFLOPs: 29.83 | 7: iteration 77190/ 115203 | consumed samples: 19760640 | consumed tokens: 40469790720 | elapsed time per iteration (s): 0.43 | learning rate: 6.499E-05 | global batch size: 256 | lm loss: 2.229615E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.552 | TFLOPs: 31.30 | 7: iteration 77200/ 115203 | consumed samples: 19763200 | consumed tokens: 40475033600 | elapsed time per iteration (s): 0.43 | learning rate: 6.497E-05 | global batch size: 256 | lm loss: 2.230844E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.336 | TFLOPs: 31.18 | 7: iteration 77210/ 115203 | consumed samples: 19765760 | consumed tokens: 40480276480 | elapsed time per iteration (s): 0.44 | learning rate: 6.495E-05 | global batch size: 256 | lm loss: 2.253702E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.289 | TFLOPs: 30.34 | 7: iteration 77220/ 115203 | consumed samples: 19768320 | consumed tokens: 40485519360 | elapsed time per iteration (s): 0.43 | learning rate: 6.493E-05 | global batch size: 256 | lm loss: 2.242799E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.110 | TFLOPs: 31.28 | 7: iteration 77230/ 115203 | consumed samples: 19770880 | consumed tokens: 40490762240 | elapsed time per iteration (s): 0.43 | learning rate: 6.491E-05 | global batch size: 256 | lm loss: 2.262303E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.927 | TFLOPs: 31.53 | 7: iteration 77240/ 115203 | consumed samples: 19773440 | consumed tokens: 40496005120 | elapsed time per iteration (s): 0.44 | learning rate: 6.489E-05 | global batch size: 256 | lm loss: 2.259183E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.085 | TFLOPs: 30.75 | 7: iteration 77250/ 115203 | consumed samples: 19776000 | consumed tokens: 40501248000 | elapsed time per iteration (s): 0.43 | learning rate: 6.486E-05 | global batch size: 256 | lm loss: 2.272806E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.293 | TFLOPs: 31.60 | 7: iteration 77260/ 115203 | consumed samples: 19778560 | consumed tokens: 40506490880 | elapsed time per iteration (s): 0.42 | learning rate: 6.484E-05 | global batch size: 256 | lm loss: 2.228057E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.077 | TFLOPs: 31.75 | 7: iteration 77270/ 115203 | consumed samples: 19781120 | consumed tokens: 40511733760 | elapsed time per iteration (s): 0.43 | learning rate: 6.482E-05 | global batch size: 256 | lm loss: 2.263478E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.410 | TFLOPs: 31.08 | 7: iteration 77280/ 115203 | consumed samples: 19783680 | consumed tokens: 40516976640 | elapsed time per iteration (s): 0.44 | learning rate: 6.480E-05 | global batch size: 256 | lm loss: 2.234585E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.107 | TFLOPs: 30.70 | 7: iteration 77290/ 115203 | consumed samples: 19786240 | consumed tokens: 40522219520 | elapsed time per iteration (s): 0.43 | learning rate: 6.478E-05 | global batch size: 256 | lm loss: 2.286637E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.843 | TFLOPs: 31.53 | 7: iteration 77300/ 115203 | consumed samples: 19788800 | consumed tokens: 40527462400 | elapsed time per iteration (s): 0.43 | learning rate: 6.476E-05 | global batch size: 256 | lm loss: 2.260068E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.884 | TFLOPs: 31.27 | 7: iteration 77310/ 115203 | consumed samples: 19791360 | consumed tokens: 40532705280 | elapsed time per iteration (s): 0.43 | learning rate: 6.474E-05 | global batch size: 256 | lm loss: 2.209166E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.016 | TFLOPs: 31.11 | 7: iteration 77320/ 115203 | consumed samples: 19793920 | consumed tokens: 40537948160 | elapsed time per iteration (s): 0.42 | learning rate: 6.471E-05 | global batch size: 256 | lm loss: 2.272663E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.539 | TFLOPs: 31.88 | 7: iteration 77330/ 115203 | consumed samples: 19796480 | consumed tokens: 40543191040 | elapsed time per iteration (s): 0.43 | learning rate: 6.469E-05 | global batch size: 256 | lm loss: 2.270054E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.792 | TFLOPs: 31.42 | 7: iteration 77340/ 115203 | consumed samples: 19799040 | consumed tokens: 40548433920 | elapsed time per iteration (s): 0.43 | learning rate: 6.467E-05 | global batch size: 256 | lm loss: 2.264844E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.213 | TFLOPs: 30.92 | 7: iteration 77350/ 115203 | consumed samples: 19801600 | consumed tokens: 40553676800 | elapsed time per iteration (s): 0.43 | learning rate: 6.465E-05 | global batch size: 256 | lm loss: 2.241072E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.238 | TFLOPs: 31.39 | 7: iteration 77360/ 115203 | consumed samples: 19804160 | consumed tokens: 40558919680 | elapsed time per iteration (s): 0.44 | learning rate: 6.463E-05 | global batch size: 256 | lm loss: 2.253487E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.113 | TFLOPs: 30.86 | 7: iteration 77370/ 115203 | consumed samples: 19806720 | consumed tokens: 40564162560 | elapsed time per iteration (s): 0.43 | learning rate: 6.461E-05 | global batch size: 256 | lm loss: 2.273616E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.320 | TFLOPs: 31.24 | 7: iteration 77380/ 115203 | consumed samples: 19809280 | consumed tokens: 40569405440 | elapsed time per iteration (s): 0.43 | learning rate: 6.459E-05 | global batch size: 256 | lm loss: 2.248928E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.912 | TFLOPs: 31.53 | 7: iteration 77390/ 115203 | consumed samples: 19811840 | consumed tokens: 40574648320 | elapsed time per iteration (s): 0.44 | learning rate: 6.456E-05 | global batch size: 256 | lm loss: 2.236419E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.344 | TFLOPs: 30.50 | 7: iteration 77400/ 115203 | consumed samples: 19814400 | consumed tokens: 40579891200 | elapsed time per iteration (s): 0.44 | learning rate: 6.454E-05 | global batch size: 256 | lm loss: 2.254063E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.247 | TFLOPs: 30.81 | 7: iteration 77410/ 115203 | consumed samples: 19816960 | consumed tokens: 40585134080 | elapsed time per iteration (s): 0.43 | learning rate: 6.452E-05 | global batch size: 256 | lm loss: 2.217812E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.478 | TFLOPs: 31.14 | 7: iteration 77420/ 115203 | consumed samples: 19819520 | consumed tokens: 40590376960 | elapsed time per iteration (s): 0.43 | learning rate: 6.450E-05 | global batch size: 256 | lm loss: 2.257054E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.263 | TFLOPs: 31.44 | 7: iteration 77430/ 115203 | consumed samples: 19822080 | consumed tokens: 40595619840 | elapsed time per iteration (s): 0.43 | learning rate: 6.448E-05 | global batch size: 256 | lm loss: 2.250466E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.750 | TFLOPs: 31.31 | 7: iteration 77440/ 115203 | consumed samples: 19824640 | consumed tokens: 40600862720 | elapsed time per iteration (s): 0.42 | learning rate: 6.446E-05 | global batch size: 256 | lm loss: 2.255099E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.900 | TFLOPs: 31.69 | 7: iteration 77450/ 115203 | consumed samples: 19827200 | consumed tokens: 40606105600 | elapsed time per iteration (s): 0.43 | learning rate: 6.444E-05 | global batch size: 256 | lm loss: 2.204297E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.245 | TFLOPs: 31.23 | 7: iteration 77460/ 115203 | consumed samples: 19829760 | consumed tokens: 40611348480 | elapsed time per iteration (s): 0.44 | learning rate: 6.441E-05 | global batch size: 256 | lm loss: 2.248053E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.479 | TFLOPs: 30.61 | 7: iteration 77470/ 115203 | consumed samples: 19832320 | consumed tokens: 40616591360 | elapsed time per iteration (s): 0.44 | learning rate: 6.439E-05 | global batch size: 256 | lm loss: 2.250928E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.041 | TFLOPs: 30.54 | 7: iteration 77480/ 115203 | consumed samples: 19834880 | consumed tokens: 40621834240 | elapsed time per iteration (s): 0.42 | learning rate: 6.437E-05 | global batch size: 256 | lm loss: 2.254274E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.713 | TFLOPs: 32.04 | 7: iteration 77490/ 115203 | consumed samples: 19837440 | consumed tokens: 40627077120 | elapsed time per iteration (s): 0.42 | learning rate: 6.435E-05 | global batch size: 256 | lm loss: 2.269561E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.744 | TFLOPs: 31.63 | 7: iteration 77500/ 115203 | consumed samples: 19840000 | consumed tokens: 40632320000 | elapsed time per iteration (s): 0.42 | learning rate: 6.433E-05 | global batch size: 256 | lm loss: 2.225006E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.559 | TFLOPs: 31.72 | 7: iteration 77510/ 115203 | consumed samples: 19842560 | consumed tokens: 40637562880 | elapsed time per iteration (s): 0.42 | learning rate: 6.431E-05 | global batch size: 256 | lm loss: 2.256485E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.699 | TFLOPs: 32.04 | 7: iteration 77520/ 115203 | consumed samples: 19845120 | consumed tokens: 40642805760 | elapsed time per iteration (s): 0.43 | learning rate: 6.429E-05 | global batch size: 256 | lm loss: 2.264825E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.778 | TFLOPs: 31.47 | 7: iteration 77530/ 115203 | consumed samples: 19847680 | consumed tokens: 40648048640 | elapsed time per iteration (s): 0.43 | learning rate: 6.426E-05 | global batch size: 256 | lm loss: 2.251983E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.016 | TFLOPs: 31.32 | 7: iteration 77540/ 115203 | consumed samples: 19850240 | consumed tokens: 40653291520 | elapsed time per iteration (s): 0.43 | learning rate: 6.424E-05 | global batch size: 256 | lm loss: 2.238052E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.184 | TFLOPs: 31.07 | 7: iteration 77550/ 115203 | consumed samples: 19852800 | consumed tokens: 40658534400 | elapsed time per iteration (s): 0.43 | learning rate: 6.422E-05 | global batch size: 256 | lm loss: 2.246309E+00 | grad norm: 0.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.397 | TFLOPs: 31.55 | 7: iteration 77560/ 115203 | consumed samples: 19855360 | consumed tokens: 40663777280 | elapsed time per iteration (s): 0.43 | learning rate: 6.420E-05 | global batch size: 256 | lm loss: 2.249740E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.702 | TFLOPs: 31.26 | 7: iteration 77570/ 115203 | consumed samples: 19857920 | consumed tokens: 40669020160 | elapsed time per iteration (s): 0.43 | learning rate: 6.418E-05 | global batch size: 256 | lm loss: 2.267195E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.857 | TFLOPs: 31.37 | 7: iteration 77580/ 115203 | consumed samples: 19860480 | consumed tokens: 40674263040 | elapsed time per iteration (s): 0.43 | learning rate: 6.416E-05 | global batch size: 256 | lm loss: 2.276888E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.209 | TFLOPs: 31.49 | 7: iteration 77590/ 115203 | consumed samples: 19863040 | consumed tokens: 40679505920 | elapsed time per iteration (s): 0.43 | learning rate: 6.414E-05 | global batch size: 256 | lm loss: 2.233553E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.498 | TFLOPs: 31.45 | 7: iteration 77600/ 115203 | consumed samples: 19865600 | consumed tokens: 40684748800 | elapsed time per iteration (s): 0.42 | learning rate: 6.412E-05 | global batch size: 256 | lm loss: 2.251287E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.189 | TFLOPs: 31.65 | 7: iteration 77610/ 115203 | consumed samples: 19868160 | consumed tokens: 40689991680 | elapsed time per iteration (s): 0.43 | learning rate: 6.409E-05 | global batch size: 256 | lm loss: 2.237986E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.090 | TFLOPs: 31.22 | 7: iteration 77620/ 115203 | consumed samples: 19870720 | consumed tokens: 40695234560 | elapsed time per iteration (s): 0.42 | learning rate: 6.407E-05 | global batch size: 256 | lm loss: 2.273423E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.805 | TFLOPs: 31.94 | 7: iteration 77630/ 115203 | consumed samples: 19873280 | consumed tokens: 40700477440 | elapsed time per iteration (s): 0.42 | learning rate: 6.405E-05 | global batch size: 256 | lm loss: 2.273376E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.890 | TFLOPs: 31.74 | 7: iteration 77640/ 115203 | consumed samples: 19875840 | consumed tokens: 40705720320 | elapsed time per iteration (s): 0.43 | learning rate: 6.403E-05 | global batch size: 256 | lm loss: 2.233242E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.637 | TFLOPs: 31.25 | 7: iteration 77650/ 115203 | consumed samples: 19878400 | consumed tokens: 40710963200 | elapsed time per iteration (s): 0.42 | learning rate: 6.401E-05 | global batch size: 256 | lm loss: 2.266707E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.698 | TFLOPs: 31.62 | 7: iteration 77660/ 115203 | consumed samples: 19880960 | consumed tokens: 40716206080 | elapsed time per iteration (s): 0.42 | learning rate: 6.399E-05 | global batch size: 256 | lm loss: 2.248376E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.551 | TFLOPs: 31.93 | 7: iteration 77670/ 115203 | consumed samples: 19883520 | consumed tokens: 40721448960 | elapsed time per iteration (s): 0.42 | learning rate: 6.397E-05 | global batch size: 256 | lm loss: 2.247607E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.445 | TFLOPs: 31.98 | 7: iteration 77680/ 115203 | consumed samples: 19886080 | consumed tokens: 40726691840 | elapsed time per iteration (s): 0.44 | learning rate: 6.394E-05 | global batch size: 256 | lm loss: 2.236218E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.017 | TFLOPs: 30.85 | 7: iteration 77690/ 115203 | consumed samples: 19888640 | consumed tokens: 40731934720 | elapsed time per iteration (s): 0.43 | learning rate: 6.392E-05 | global batch size: 256 | lm loss: 2.220199E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.363 | TFLOPs: 31.29 | 7: iteration 77700/ 115203 | consumed samples: 19891200 | consumed tokens: 40737177600 | elapsed time per iteration (s): 0.45 | learning rate: 6.390E-05 | global batch size: 256 | lm loss: 2.261379E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.317 | TFLOPs: 29.71 | 7: iteration 77710/ 115203 | consumed samples: 19893760 | consumed tokens: 40742420480 | elapsed time per iteration (s): 0.43 | learning rate: 6.388E-05 | global batch size: 256 | lm loss: 2.270764E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.480 | TFLOPs: 31.30 | 7: iteration 77720/ 115203 | consumed samples: 19896320 | consumed tokens: 40747663360 | elapsed time per iteration (s): 0.43 | learning rate: 6.386E-05 | global batch size: 256 | lm loss: 2.253362E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.817 | TFLOPs: 31.21 | 7: iteration 77730/ 115203 | consumed samples: 19898880 | consumed tokens: 40752906240 | elapsed time per iteration (s): 0.42 | learning rate: 6.384E-05 | global batch size: 256 | lm loss: 2.226087E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.882 | TFLOPs: 31.89 | 7: iteration 77740/ 115203 | consumed samples: 19901440 | consumed tokens: 40758149120 | elapsed time per iteration (s): 0.43 | learning rate: 6.382E-05 | global batch size: 256 | lm loss: 2.220208E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.636 | TFLOPs: 31.09 | 7: iteration 77750/ 115203 | consumed samples: 19904000 | consumed tokens: 40763392000 | elapsed time per iteration (s): 0.42 | learning rate: 6.380E-05 | global batch size: 256 | lm loss: 2.237627E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.776 | TFLOPs: 31.63 | 7: iteration 77760/ 115203 | consumed samples: 19906560 | consumed tokens: 40768634880 | elapsed time per iteration (s): 0.43 | learning rate: 6.377E-05 | global batch size: 256 | lm loss: 2.244237E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.071 | TFLOPs: 31.48 | 7: iteration 77770/ 115203 | consumed samples: 19909120 | consumed tokens: 40773877760 | elapsed time per iteration (s): 0.43 | learning rate: 6.375E-05 | global batch size: 256 | lm loss: 2.241999E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.236 | TFLOPs: 31.49 | 7: iteration 77780/ 115203 | consumed samples: 19911680 | consumed tokens: 40779120640 | elapsed time per iteration (s): 0.42 | learning rate: 6.373E-05 | global batch size: 256 | lm loss: 2.256573E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.359 | TFLOPs: 31.60 | 7: iteration 77790/ 115203 | consumed samples: 19914240 | consumed tokens: 40784363520 | elapsed time per iteration (s): 0.43 | learning rate: 6.371E-05 | global batch size: 256 | lm loss: 2.246628E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.268 | TFLOPs: 31.29 | 7: iteration 77800/ 115203 | consumed samples: 19916800 | consumed tokens: 40789606400 | elapsed time per iteration (s): 0.42 | learning rate: 6.369E-05 | global batch size: 256 | lm loss: 2.251911E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.717 | TFLOPs: 31.94 | 7: iteration 77810/ 115203 | consumed samples: 19919360 | consumed tokens: 40794849280 | elapsed time per iteration (s): 0.43 | learning rate: 6.367E-05 | global batch size: 256 | lm loss: 2.202561E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.267 | TFLOPs: 31.39 | 7: iteration 77820/ 115203 | consumed samples: 19921920 | consumed tokens: 40800092160 | elapsed time per iteration (s): 0.42 | learning rate: 6.365E-05 | global batch size: 256 | lm loss: 2.242653E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.478 | TFLOPs: 31.82 | 7: iteration 77830/ 115203 | consumed samples: 19924480 | consumed tokens: 40805335040 | elapsed time per iteration (s): 0.43 | learning rate: 6.363E-05 | global batch size: 256 | lm loss: 2.253831E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.002 | TFLOPs: 31.32 | 7: iteration 77840/ 115203 | consumed samples: 19927040 | consumed tokens: 40810577920 | elapsed time per iteration (s): 0.42 | learning rate: 6.360E-05 | global batch size: 256 | lm loss: 2.260917E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.936 | TFLOPs: 31.69 | 7: iteration 77850/ 115203 | consumed samples: 19929600 | consumed tokens: 40815820800 | elapsed time per iteration (s): 0.43 | learning rate: 6.358E-05 | global batch size: 256 | lm loss: 2.245698E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.330 | TFLOPs: 31.13 | 7: iteration 77860/ 115203 | consumed samples: 19932160 | consumed tokens: 40821063680 | elapsed time per iteration (s): 0.43 | learning rate: 6.356E-05 | global batch size: 256 | lm loss: 2.241733E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.189 | TFLOPs: 31.54 | 7: iteration 77870/ 115203 | consumed samples: 19934720 | consumed tokens: 40826306560 | elapsed time per iteration (s): 0.44 | learning rate: 6.354E-05 | global batch size: 256 | lm loss: 2.266850E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.373 | TFLOPs: 30.71 | 7: iteration 77880/ 115203 | consumed samples: 19937280 | consumed tokens: 40831549440 | elapsed time per iteration (s): 0.43 | learning rate: 6.352E-05 | global batch size: 256 | lm loss: 2.246456E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.477 | TFLOPs: 31.24 | 7: iteration 77890/ 115203 | consumed samples: 19939840 | consumed tokens: 40836792320 | elapsed time per iteration (s): 0.43 | learning rate: 6.350E-05 | global batch size: 256 | lm loss: 2.247802E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.757 | TFLOPs: 31.26 | 7: iteration 77900/ 115203 | consumed samples: 19942400 | consumed tokens: 40842035200 | elapsed time per iteration (s): 0.43 | learning rate: 6.348E-05 | global batch size: 256 | lm loss: 2.244894E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.143 | TFLOPs: 31.02 | 7: iteration 77910/ 115203 | consumed samples: 19944960 | consumed tokens: 40847278080 | elapsed time per iteration (s): 0.43 | learning rate: 6.346E-05 | global batch size: 256 | lm loss: 2.247710E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.716 | TFLOPs: 31.20 | 7: iteration 77920/ 115203 | consumed samples: 19947520 | consumed tokens: 40852520960 | elapsed time per iteration (s): 0.45 | learning rate: 6.343E-05 | global batch size: 256 | lm loss: 2.239598E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.210 | TFLOPs: 29.92 | 7: iteration 77930/ 115203 | consumed samples: 19950080 | consumed tokens: 40857763840 | elapsed time per iteration (s): 0.43 | learning rate: 6.341E-05 | global batch size: 256 | lm loss: 2.241328E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.279 | TFLOPs: 31.55 | 7: iteration 77940/ 115203 | consumed samples: 19952640 | consumed tokens: 40863006720 | elapsed time per iteration (s): 0.43 | learning rate: 6.339E-05 | global batch size: 256 | lm loss: 2.225751E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.715 | TFLOPs: 31.41 | 7: iteration 77950/ 115203 | consumed samples: 19955200 | consumed tokens: 40868249600 | elapsed time per iteration (s): 0.43 | learning rate: 6.337E-05 | global batch size: 256 | lm loss: 2.262030E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.443 | TFLOPs: 31.14 | 7: iteration 77960/ 115203 | consumed samples: 19957760 | consumed tokens: 40873492480 | elapsed time per iteration (s): 0.43 | learning rate: 6.335E-05 | global batch size: 256 | lm loss: 2.259753E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.895 | TFLOPs: 31.16 | 7: iteration 77970/ 115203 | consumed samples: 19960320 | consumed tokens: 40878735360 | elapsed time per iteration (s): 0.42 | learning rate: 6.333E-05 | global batch size: 256 | lm loss: 2.221475E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.354 | TFLOPs: 31.60 | 7: iteration 77980/ 115203 | consumed samples: 19962880 | consumed tokens: 40883978240 | elapsed time per iteration (s): 0.43 | learning rate: 6.331E-05 | global batch size: 256 | lm loss: 2.276902E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.975 | TFLOPs: 31.48 | 7: iteration 77990/ 115203 | consumed samples: 19965440 | consumed tokens: 40889221120 | elapsed time per iteration (s): 0.43 | learning rate: 6.329E-05 | global batch size: 256 | lm loss: 2.258370E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.439 | TFLOPs: 31.56 | 0: [2022-11-28 22:20:45,274] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=0, lr=[6.326508628233516e-05, 6.326508628233516e-05, 6.326508628233516e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 78000/ 115203 | consumed samples: 19968000 | consumed tokens: 40894464000 | elapsed time per iteration (s): 0.44 | learning rate: 6.327E-05 | global batch size: 256 | lm loss: 2.266311E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.578 | TFLOPs: 30.57 | 0: steps: 78000 loss: 2.3444 iter time (s): 0.429 samples/sec: 597.284 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 78000 | lm loss value: 2.201304E+00 | lm loss PPL: 9.036789E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 78000 to checkpoints_221m 0: [2022-11-28 22:20:45,436] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step78000 is begin to save! 0: [2022-11-28 22:20:45,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:20:45,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:20:45,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:20:45,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:20:45,565] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:20:45,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:20:45,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:20:45,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:20:45,618] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:20:45,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:20:45,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:20:45,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:20:45,662] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:20:45,684] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:20:45,684] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:20:45,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:20:45,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:20:45,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:20:45,731] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:20:45,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:20:45,754] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:20:45,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:20:45,777] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:20:45,801] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:20:45,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:20:45,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:20:45,824] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:20:45,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:20:45,847] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:20:45,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:20:45,869] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:20:45,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:20:45,891] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:20:45,916] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:20:45,916] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:20:45,939] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:20:45,939] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:20:45,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:20:45,962] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:20:45,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:20:45,967] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step78000/mp_rank_00_model_states.pt 0: [2022-11-28 22:20:45,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:20:45,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:20:45,987] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:20:46,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,035] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,035] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,035] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,036] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,036] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,039] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,040] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,040] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,044] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,046] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,046] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,047] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 2: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 2: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 4: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,049] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,049] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 6: [2022-11-28 22:20:46,058] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:20:46,058] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:20:46,058] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:20:46,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 1: [2022-11-28 22:20:46,059] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,059] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:20:46,059] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,050] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,051] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:20:46,048] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 3: [2022-11-28 22:20:46,048] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,051] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,051] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:20:46,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,057] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 5: [2022-11-28 22:20:46,057] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:20:46,061] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:20:46,061] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,068] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,068] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,068] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,068] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:20:46,069] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,069] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: [2022-11-28 22:20:46,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step78000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:20:46,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step78000 is ready now! 0: successfully saved checkpoint at iteration 78000 to checkpoints_221m 7: time (ms) | save-checkpoint: 678.74 7: iteration 78010/ 115203 | consumed samples: 19970560 | consumed tokens: 40899706880 | elapsed time per iteration (s): 0.53 | learning rate: 6.324E-05 | global batch size: 256 | lm loss: 2.225694E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.921 | TFLOPs: 25.39 | 7: iteration 78020/ 115203 | consumed samples: 19973120 | consumed tokens: 40904949760 | elapsed time per iteration (s): 0.43 | learning rate: 6.322E-05 | global batch size: 256 | lm loss: 2.249005E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.965 | TFLOPs: 31.43 | 7: iteration 78030/ 115203 | consumed samples: 19975680 | consumed tokens: 40910192640 | elapsed time per iteration (s): 0.43 | learning rate: 6.320E-05 | global batch size: 256 | lm loss: 2.256059E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.617 | TFLOPs: 31.41 | 7: iteration 78040/ 115203 | consumed samples: 19978240 | consumed tokens: 40915435520 | elapsed time per iteration (s): 0.43 | learning rate: 6.318E-05 | global batch size: 256 | lm loss: 2.244487E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.867 | TFLOPs: 31.42 | 7: iteration 78050/ 115203 | consumed samples: 19980800 | consumed tokens: 40920678400 | elapsed time per iteration (s): 0.43 | learning rate: 6.316E-05 | global batch size: 256 | lm loss: 2.240031E+00 | grad norm: 0.216 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.286 | TFLOPs: 30.97 | 7: iteration 78060/ 115203 | consumed samples: 19983360 | consumed tokens: 40925921280 | elapsed time per iteration (s): 0.43 | learning rate: 6.314E-05 | global batch size: 256 | lm loss: 2.246778E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.132 | TFLOPs: 31.12 | 7: iteration 78070/ 115203 | consumed samples: 19985920 | consumed tokens: 40931164160 | elapsed time per iteration (s): 0.42 | learning rate: 6.312E-05 | global batch size: 256 | lm loss: 2.264684E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.981 | TFLOPs: 31.74 | 7: iteration 78080/ 115203 | consumed samples: 19988480 | consumed tokens: 40936407040 | elapsed time per iteration (s): 0.42 | learning rate: 6.310E-05 | global batch size: 256 | lm loss: 2.262917E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.821 | TFLOPs: 31.79 | 7: iteration 78090/ 115203 | consumed samples: 19991040 | consumed tokens: 40941649920 | elapsed time per iteration (s): 0.44 | learning rate: 6.307E-05 | global batch size: 256 | lm loss: 2.253622E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.249 | TFLOPs: 30.39 | 7: iteration 78100/ 115203 | consumed samples: 19993600 | consumed tokens: 40946892800 | elapsed time per iteration (s): 0.43 | learning rate: 6.305E-05 | global batch size: 256 | lm loss: 2.241158E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.754 | TFLOPs: 31.52 | 7: iteration 78110/ 115203 | consumed samples: 19996160 | consumed tokens: 40952135680 | elapsed time per iteration (s): 0.43 | learning rate: 6.303E-05 | global batch size: 256 | lm loss: 2.246937E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.617 | TFLOPs: 31.41 | 7: iteration 78120/ 115203 | consumed samples: 19998720 | consumed tokens: 40957378560 | elapsed time per iteration (s): 0.42 | learning rate: 6.301E-05 | global batch size: 256 | lm loss: 2.260948E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.194 | TFLOPs: 31.91 | 7: iteration 78130/ 115203 | consumed samples: 20001280 | consumed tokens: 40962621440 | elapsed time per iteration (s): 0.43 | learning rate: 6.299E-05 | global batch size: 256 | lm loss: 2.248539E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.282 | TFLOPs: 31.23 | 7: iteration 78140/ 115203 | consumed samples: 20003840 | consumed tokens: 40967864320 | elapsed time per iteration (s): 0.42 | learning rate: 6.297E-05 | global batch size: 256 | lm loss: 2.253896E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.337 | TFLOPs: 31.81 | 7: iteration 78150/ 115203 | consumed samples: 20006400 | consumed tokens: 40973107200 | elapsed time per iteration (s): 0.42 | learning rate: 6.295E-05 | global batch size: 256 | lm loss: 2.211332E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.394 | TFLOPs: 31.82 | 7: iteration 78160/ 115203 | consumed samples: 20008960 | consumed tokens: 40978350080 | elapsed time per iteration (s): 0.43 | learning rate: 6.293E-05 | global batch size: 256 | lm loss: 2.254427E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.065 | TFLOPs: 31.12 | 7: iteration 78170/ 115203 | consumed samples: 20011520 | consumed tokens: 40983592960 | elapsed time per iteration (s): 0.43 | learning rate: 6.291E-05 | global batch size: 256 | lm loss: 2.226689E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.298 | TFLOPs: 31.50 | 7: iteration 78180/ 115203 | consumed samples: 20014080 | consumed tokens: 40988835840 | elapsed time per iteration (s): 0.43 | learning rate: 6.288E-05 | global batch size: 256 | lm loss: 2.229677E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.144 | TFLOPs: 31.44 | 7: iteration 78190/ 115203 | consumed samples: 20016640 | consumed tokens: 40994078720 | elapsed time per iteration (s): 0.43 | learning rate: 6.286E-05 | global batch size: 256 | lm loss: 2.248080E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.987 | TFLOPs: 31.48 | 7: iteration 78200/ 115203 | consumed samples: 20019200 | consumed tokens: 40999321600 | elapsed time per iteration (s): 0.43 | learning rate: 6.284E-05 | global batch size: 256 | lm loss: 2.237716E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.667 | TFLOPs: 31.46 | 7: iteration 78210/ 115203 | consumed samples: 20021760 | consumed tokens: 41004564480 | elapsed time per iteration (s): 0.44 | learning rate: 6.282E-05 | global batch size: 256 | lm loss: 2.245876E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.689 | TFLOPs: 30.57 | 7: iteration 78220/ 115203 | consumed samples: 20024320 | consumed tokens: 41009807360 | elapsed time per iteration (s): 0.43 | learning rate: 6.280E-05 | global batch size: 256 | lm loss: 2.216134E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.502 | TFLOPs: 30.93 | 7: iteration 78230/ 115203 | consumed samples: 20026880 | consumed tokens: 41015050240 | elapsed time per iteration (s): 0.43 | learning rate: 6.278E-05 | global batch size: 256 | lm loss: 2.227968E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.850 | TFLOPs: 31.32 | 7: iteration 78240/ 115203 | consumed samples: 20029440 | consumed tokens: 41020293120 | elapsed time per iteration (s): 0.43 | learning rate: 6.276E-05 | global batch size: 256 | lm loss: 2.261767E+00 | grad norm: 0.215 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.865 | TFLOPs: 31.11 | 7: iteration 78250/ 115203 | consumed samples: 20032000 | consumed tokens: 41025536000 | elapsed time per iteration (s): 0.43 | learning rate: 6.274E-05 | global batch size: 256 | lm loss: 2.258116E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.619 | TFLOPs: 31.09 | 7: iteration 78260/ 115203 | consumed samples: 20034560 | consumed tokens: 41030778880 | elapsed time per iteration (s): 0.43 | learning rate: 6.272E-05 | global batch size: 256 | lm loss: 2.263012E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.908 | TFLOPs: 31.37 | 7: iteration 78270/ 115203 | consumed samples: 20037120 | consumed tokens: 41036021760 | elapsed time per iteration (s): 0.43 | learning rate: 6.269E-05 | global batch size: 256 | lm loss: 2.257943E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.635 | TFLOPs: 31.15 | 7: iteration 78280/ 115203 | consumed samples: 20039680 | consumed tokens: 41041264640 | elapsed time per iteration (s): 0.43 | learning rate: 6.267E-05 | global batch size: 256 | lm loss: 2.239221E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.292 | TFLOPs: 31.50 | 7: iteration 78290/ 115203 | consumed samples: 20042240 | consumed tokens: 41046507520 | elapsed time per iteration (s): 0.42 | learning rate: 6.265E-05 | global batch size: 256 | lm loss: 2.219011E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.990 | TFLOPs: 31.69 | 7: iteration 78300/ 115203 | consumed samples: 20044800 | consumed tokens: 41051750400 | elapsed time per iteration (s): 0.43 | learning rate: 6.263E-05 | global batch size: 256 | lm loss: 2.264365E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.641 | TFLOPs: 31.51 | 7: iteration 78310/ 115203 | consumed samples: 20047360 | consumed tokens: 41056993280 | elapsed time per iteration (s): 0.43 | learning rate: 6.261E-05 | global batch size: 256 | lm loss: 2.251273E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.781 | TFLOPs: 31.15 | 7: iteration 78320/ 115203 | consumed samples: 20049920 | consumed tokens: 41062236160 | elapsed time per iteration (s): 0.43 | learning rate: 6.259E-05 | global batch size: 256 | lm loss: 2.247795E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.628 | TFLOPs: 31.51 | 7: iteration 78330/ 115203 | consumed samples: 20052480 | consumed tokens: 41067479040 | elapsed time per iteration (s): 0.43 | learning rate: 6.257E-05 | global batch size: 256 | lm loss: 2.243378E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.188 | TFLOPs: 31.60 | 7: iteration 78340/ 115203 | consumed samples: 20055040 | consumed tokens: 41072721920 | elapsed time per iteration (s): 0.42 | learning rate: 6.255E-05 | global batch size: 256 | lm loss: 2.252482E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 616.306 | TFLOPs: 32.34 | 7: iteration 78350/ 115203 | consumed samples: 20057600 | consumed tokens: 41077964800 | elapsed time per iteration (s): 0.43 | learning rate: 6.253E-05 | global batch size: 256 | lm loss: 2.235081E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.809 | TFLOPs: 31.21 | 7: iteration 78360/ 115203 | consumed samples: 20060160 | consumed tokens: 41083207680 | elapsed time per iteration (s): 0.43 | learning rate: 6.250E-05 | global batch size: 256 | lm loss: 2.234718E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.133 | TFLOPs: 31.33 | 7: iteration 78370/ 115203 | consumed samples: 20062720 | consumed tokens: 41088450560 | elapsed time per iteration (s): 0.43 | learning rate: 6.248E-05 | global batch size: 256 | lm loss: 2.250078E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.633 | TFLOPs: 30.99 | 7: iteration 78380/ 115203 | consumed samples: 20065280 | consumed tokens: 41093693440 | elapsed time per iteration (s): 0.42 | learning rate: 6.246E-05 | global batch size: 256 | lm loss: 2.268430E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.449 | TFLOPs: 31.66 | 7: iteration 78390/ 115203 | consumed samples: 20067840 | consumed tokens: 41098936320 | elapsed time per iteration (s): 0.42 | learning rate: 6.244E-05 | global batch size: 256 | lm loss: 2.207961E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.013 | TFLOPs: 31.69 | 7: iteration 78400/ 115203 | consumed samples: 20070400 | consumed tokens: 41104179200 | elapsed time per iteration (s): 0.42 | learning rate: 6.242E-05 | global batch size: 256 | lm loss: 2.245678E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.631 | TFLOPs: 31.78 | 7: iteration 78410/ 115203 | consumed samples: 20072960 | consumed tokens: 41109422080 | elapsed time per iteration (s): 0.42 | learning rate: 6.240E-05 | global batch size: 256 | lm loss: 2.251728E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.928 | TFLOPs: 31.69 | 7: iteration 78420/ 115203 | consumed samples: 20075520 | consumed tokens: 41114664960 | elapsed time per iteration (s): 0.42 | learning rate: 6.238E-05 | global batch size: 256 | lm loss: 2.257922E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.429 | TFLOPs: 31.82 | 7: iteration 78430/ 115203 | consumed samples: 20078080 | consumed tokens: 41119907840 | elapsed time per iteration (s): 0.42 | learning rate: 6.236E-05 | global batch size: 256 | lm loss: 2.260511E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.310 | TFLOPs: 31.71 | 7: iteration 78440/ 115203 | consumed samples: 20080640 | consumed tokens: 41125150720 | elapsed time per iteration (s): 0.42 | learning rate: 6.234E-05 | global batch size: 256 | lm loss: 2.244328E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.140 | TFLOPs: 32.12 | 7: iteration 78450/ 115203 | consumed samples: 20083200 | consumed tokens: 41130393600 | elapsed time per iteration (s): 0.43 | learning rate: 6.232E-05 | global batch size: 256 | lm loss: 2.236693E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.573 | TFLOPs: 31.46 | 7: iteration 78460/ 115203 | consumed samples: 20085760 | consumed tokens: 41135636480 | elapsed time per iteration (s): 0.43 | learning rate: 6.229E-05 | global batch size: 256 | lm loss: 2.243271E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.984 | TFLOPs: 31.17 | 7: iteration 78470/ 115203 | consumed samples: 20088320 | consumed tokens: 41140879360 | elapsed time per iteration (s): 0.42 | learning rate: 6.227E-05 | global batch size: 256 | lm loss: 2.234829E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.536 | TFLOPs: 31.93 | 7: iteration 78480/ 115203 | consumed samples: 20090880 | consumed tokens: 41146122240 | elapsed time per iteration (s): 0.42 | learning rate: 6.225E-05 | global batch size: 256 | lm loss: 2.286786E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.927 | TFLOPs: 31.79 | 7: iteration 78490/ 115203 | consumed samples: 20093440 | consumed tokens: 41151365120 | elapsed time per iteration (s): 0.42 | learning rate: 6.223E-05 | global batch size: 256 | lm loss: 2.276505E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.455 | TFLOPs: 31.61 | 7: iteration 78500/ 115203 | consumed samples: 20096000 | consumed tokens: 41156608000 | elapsed time per iteration (s): 0.42 | learning rate: 6.221E-05 | global batch size: 256 | lm loss: 2.261659E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.062 | TFLOPs: 32.06 | 7: iteration 78510/ 115203 | consumed samples: 20098560 | consumed tokens: 41161850880 | elapsed time per iteration (s): 0.43 | learning rate: 6.219E-05 | global batch size: 256 | lm loss: 2.248828E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.831 | TFLOPs: 31.31 | 7: iteration 78520/ 115203 | consumed samples: 20101120 | consumed tokens: 41167093760 | elapsed time per iteration (s): 0.43 | learning rate: 6.217E-05 | global batch size: 256 | lm loss: 2.260579E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.571 | TFLOPs: 31.35 | 7: iteration 78530/ 115203 | consumed samples: 20103680 | consumed tokens: 41172336640 | elapsed time per iteration (s): 0.42 | learning rate: 6.215E-05 | global batch size: 256 | lm loss: 2.279669E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.254 | TFLOPs: 31.65 | 7: iteration 78540/ 115203 | consumed samples: 20106240 | consumed tokens: 41177579520 | elapsed time per iteration (s): 0.42 | learning rate: 6.213E-05 | global batch size: 256 | lm loss: 2.251361E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.565 | TFLOPs: 31.72 | 7: iteration 78550/ 115203 | consumed samples: 20108800 | consumed tokens: 41182822400 | elapsed time per iteration (s): 0.42 | learning rate: 6.211E-05 | global batch size: 256 | lm loss: 2.268076E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.355 | TFLOPs: 31.76 | 7: iteration 78560/ 115203 | consumed samples: 20111360 | consumed tokens: 41188065280 | elapsed time per iteration (s): 0.42 | learning rate: 6.208E-05 | global batch size: 256 | lm loss: 2.242350E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.377 | TFLOPs: 31.97 | 7: iteration 78570/ 115203 | consumed samples: 20113920 | consumed tokens: 41193308160 | elapsed time per iteration (s): 0.42 | learning rate: 6.206E-05 | global batch size: 256 | lm loss: 2.227197E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 614.819 | TFLOPs: 32.26 | 7: iteration 78580/ 115203 | consumed samples: 20116480 | consumed tokens: 41198551040 | elapsed time per iteration (s): 0.42 | learning rate: 6.204E-05 | global batch size: 256 | lm loss: 2.234991E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.237 | TFLOPs: 31.91 | 7: iteration 78590/ 115203 | consumed samples: 20119040 | consumed tokens: 41203793920 | elapsed time per iteration (s): 0.42 | learning rate: 6.202E-05 | global batch size: 256 | lm loss: 2.228006E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.605 | TFLOPs: 31.88 | 7: iteration 78600/ 115203 | consumed samples: 20121600 | consumed tokens: 41209036800 | elapsed time per iteration (s): 0.43 | learning rate: 6.200E-05 | global batch size: 256 | lm loss: 2.215646E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.624 | TFLOPs: 31.30 | 7: iteration 78610/ 115203 | consumed samples: 20124160 | consumed tokens: 41214279680 | elapsed time per iteration (s): 0.42 | learning rate: 6.198E-05 | global batch size: 256 | lm loss: 2.255179E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.313 | TFLOPs: 31.76 | 7: iteration 78620/ 115203 | consumed samples: 20126720 | consumed tokens: 41219522560 | elapsed time per iteration (s): 0.42 | learning rate: 6.196E-05 | global batch size: 256 | lm loss: 2.249572E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.422 | TFLOPs: 31.71 | 7: iteration 78630/ 115203 | consumed samples: 20129280 | consumed tokens: 41224765440 | elapsed time per iteration (s): 0.42 | learning rate: 6.194E-05 | global batch size: 256 | lm loss: 2.244947E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.745 | TFLOPs: 31.94 | 7: iteration 78640/ 115203 | consumed samples: 20131840 | consumed tokens: 41230008320 | elapsed time per iteration (s): 0.43 | learning rate: 6.192E-05 | global batch size: 256 | lm loss: 2.250387E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.054 | TFLOPs: 31.01 | 7: iteration 78650/ 115203 | consumed samples: 20134400 | consumed tokens: 41235251200 | elapsed time per iteration (s): 0.43 | learning rate: 6.190E-05 | global batch size: 256 | lm loss: 2.255202E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.063 | TFLOPs: 31.59 | 7: iteration 78660/ 115203 | consumed samples: 20136960 | consumed tokens: 41240494080 | elapsed time per iteration (s): 0.43 | learning rate: 6.187E-05 | global batch size: 256 | lm loss: 2.258359E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.480 | TFLOPs: 31.35 | 7: iteration 78670/ 115203 | consumed samples: 20139520 | consumed tokens: 41245736960 | elapsed time per iteration (s): 0.43 | learning rate: 6.185E-05 | global batch size: 256 | lm loss: 2.251425E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.789 | TFLOPs: 31.31 | 7: iteration 78680/ 115203 | consumed samples: 20142080 | consumed tokens: 41250979840 | elapsed time per iteration (s): 0.42 | learning rate: 6.183E-05 | global batch size: 256 | lm loss: 2.254535E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.439 | TFLOPs: 31.71 | 7: iteration 78690/ 115203 | consumed samples: 20144640 | consumed tokens: 41256222720 | elapsed time per iteration (s): 0.43 | learning rate: 6.181E-05 | global batch size: 256 | lm loss: 2.237552E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.089 | TFLOPs: 31.54 | 7: iteration 78700/ 115203 | consumed samples: 20147200 | consumed tokens: 41261465600 | elapsed time per iteration (s): 0.42 | learning rate: 6.179E-05 | global batch size: 256 | lm loss: 2.255246E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.320 | TFLOPs: 31.66 | 7: iteration 78710/ 115203 | consumed samples: 20149760 | consumed tokens: 41266708480 | elapsed time per iteration (s): 0.44 | learning rate: 6.177E-05 | global batch size: 256 | lm loss: 2.231430E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.649 | TFLOPs: 30.41 | 7: iteration 78720/ 115203 | consumed samples: 20152320 | consumed tokens: 41271951360 | elapsed time per iteration (s): 0.44 | learning rate: 6.175E-05 | global batch size: 256 | lm loss: 2.250783E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.910 | TFLOPs: 30.74 | 7: iteration 78730/ 115203 | consumed samples: 20154880 | consumed tokens: 41277194240 | elapsed time per iteration (s): 0.42 | learning rate: 6.173E-05 | global batch size: 256 | lm loss: 2.260376E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.928 | TFLOPs: 31.63 | 7: iteration 78740/ 115203 | consumed samples: 20157440 | consumed tokens: 41282437120 | elapsed time per iteration (s): 0.43 | learning rate: 6.171E-05 | global batch size: 256 | lm loss: 2.255642E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.775 | TFLOPs: 31.26 | 7: iteration 78750/ 115203 | consumed samples: 20160000 | consumed tokens: 41287680000 | elapsed time per iteration (s): 0.42 | learning rate: 6.169E-05 | global batch size: 256 | lm loss: 2.233170E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.128 | TFLOPs: 32.12 | 7: iteration 78760/ 115203 | consumed samples: 20162560 | consumed tokens: 41292922880 | elapsed time per iteration (s): 0.42 | learning rate: 6.167E-05 | global batch size: 256 | lm loss: 2.236019E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.799 | TFLOPs: 32.31 | 7: iteration 78770/ 115203 | consumed samples: 20165120 | consumed tokens: 41298165760 | elapsed time per iteration (s): 0.44 | learning rate: 6.164E-05 | global batch size: 256 | lm loss: 2.250133E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.132 | TFLOPs: 30.70 | 7: iteration 78780/ 115203 | consumed samples: 20167680 | consumed tokens: 41303408640 | elapsed time per iteration (s): 0.43 | learning rate: 6.162E-05 | global batch size: 256 | lm loss: 2.250970E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.868 | TFLOPs: 30.95 | 7: iteration 78790/ 115203 | consumed samples: 20170240 | consumed tokens: 41308651520 | elapsed time per iteration (s): 0.43 | learning rate: 6.160E-05 | global batch size: 256 | lm loss: 2.217565E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.782 | TFLOPs: 31.31 | 7: iteration 78800/ 115203 | consumed samples: 20172800 | consumed tokens: 41313894400 | elapsed time per iteration (s): 0.44 | learning rate: 6.158E-05 | global batch size: 256 | lm loss: 2.209738E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.492 | TFLOPs: 30.46 | 7: iteration 78810/ 115203 | consumed samples: 20175360 | consumed tokens: 41319137280 | elapsed time per iteration (s): 0.43 | learning rate: 6.156E-05 | global batch size: 256 | lm loss: 2.247890E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.114 | TFLOPs: 31.38 | 7: iteration 78820/ 115203 | consumed samples: 20177920 | consumed tokens: 41324380160 | elapsed time per iteration (s): 0.43 | learning rate: 6.154E-05 | global batch size: 256 | lm loss: 2.255079E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.364 | TFLOPs: 31.24 | 7: iteration 78830/ 115203 | consumed samples: 20180480 | consumed tokens: 41329623040 | elapsed time per iteration (s): 0.45 | learning rate: 6.152E-05 | global batch size: 256 | lm loss: 2.259953E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.618 | TFLOPs: 29.83 | 7: iteration 78840/ 115203 | consumed samples: 20183040 | consumed tokens: 41334865920 | elapsed time per iteration (s): 0.43 | learning rate: 6.150E-05 | global batch size: 256 | lm loss: 2.259305E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.088 | TFLOPs: 31.59 | 7: iteration 78850/ 115203 | consumed samples: 20185600 | consumed tokens: 41340108800 | elapsed time per iteration (s): 0.43 | learning rate: 6.148E-05 | global batch size: 256 | lm loss: 2.257665E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.175 | TFLOPs: 31.54 | 7: iteration 78860/ 115203 | consumed samples: 20188160 | consumed tokens: 41345351680 | elapsed time per iteration (s): 0.43 | learning rate: 6.146E-05 | global batch size: 256 | lm loss: 2.270005E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.839 | TFLOPs: 31.42 | 7: iteration 78870/ 115203 | consumed samples: 20190720 | consumed tokens: 41350594560 | elapsed time per iteration (s): 0.43 | learning rate: 6.144E-05 | global batch size: 256 | lm loss: 2.243089E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.684 | TFLOPs: 31.46 | 7: iteration 78880/ 115203 | consumed samples: 20193280 | consumed tokens: 41355837440 | elapsed time per iteration (s): 0.43 | learning rate: 6.141E-05 | global batch size: 256 | lm loss: 2.220713E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.071 | TFLOPs: 31.54 | 7: iteration 78890/ 115203 | consumed samples: 20195840 | consumed tokens: 41361080320 | elapsed time per iteration (s): 0.42 | learning rate: 6.139E-05 | global batch size: 256 | lm loss: 2.284576E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.040 | TFLOPs: 31.64 | 7: iteration 78900/ 115203 | consumed samples: 20198400 | consumed tokens: 41366323200 | elapsed time per iteration (s): 0.43 | learning rate: 6.137E-05 | global batch size: 256 | lm loss: 2.230409E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.272 | TFLOPs: 31.39 | 7: iteration 78910/ 115203 | consumed samples: 20200960 | consumed tokens: 41371566080 | elapsed time per iteration (s): 0.43 | learning rate: 6.135E-05 | global batch size: 256 | lm loss: 2.264103E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.987 | TFLOPs: 31.22 | 7: iteration 78920/ 115203 | consumed samples: 20203520 | consumed tokens: 41376808960 | elapsed time per iteration (s): 0.44 | learning rate: 6.133E-05 | global batch size: 256 | lm loss: 2.230732E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.652 | TFLOPs: 30.78 | 7: iteration 78930/ 115203 | consumed samples: 20206080 | consumed tokens: 41382051840 | elapsed time per iteration (s): 0.45 | learning rate: 6.131E-05 | global batch size: 256 | lm loss: 2.240029E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.114 | TFLOPs: 29.76 | 7: iteration 78940/ 115203 | consumed samples: 20208640 | consumed tokens: 41387294720 | elapsed time per iteration (s): 0.45 | learning rate: 6.129E-05 | global batch size: 256 | lm loss: 2.278205E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.687 | TFLOPs: 30.15 | 7: iteration 78950/ 115203 | consumed samples: 20211200 | consumed tokens: 41392537600 | elapsed time per iteration (s): 0.44 | learning rate: 6.127E-05 | global batch size: 256 | lm loss: 2.254068E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.566 | TFLOPs: 30.72 | 7: iteration 78960/ 115203 | consumed samples: 20213760 | consumed tokens: 41397780480 | elapsed time per iteration (s): 0.42 | learning rate: 6.125E-05 | global batch size: 256 | lm loss: 2.227378E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.993 | TFLOPs: 31.64 | 7: iteration 78970/ 115203 | consumed samples: 20216320 | consumed tokens: 41403023360 | elapsed time per iteration (s): 0.43 | learning rate: 6.123E-05 | global batch size: 256 | lm loss: 2.232983E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.351 | TFLOPs: 31.03 | 7: iteration 78980/ 115203 | consumed samples: 20218880 | consumed tokens: 41408266240 | elapsed time per iteration (s): 0.50 | learning rate: 6.121E-05 | global batch size: 256 | lm loss: 2.252615E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 506.970 | TFLOPs: 26.60 | 7: iteration 78990/ 115203 | consumed samples: 20221440 | consumed tokens: 41413509120 | elapsed time per iteration (s): 0.42 | learning rate: 6.119E-05 | global batch size: 256 | lm loss: 2.235878E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.251 | TFLOPs: 31.86 | 7: iteration 79000/ 115203 | consumed samples: 20224000 | consumed tokens: 41418752000 | elapsed time per iteration (s): 0.43 | learning rate: 6.116E-05 | global batch size: 256 | lm loss: 2.235142E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.010 | TFLOPs: 31.38 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 79000 | lm loss value: 2.222655E+00 | lm loss PPL: 9.231812E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 79000 to checkpoints_221m 0: [2022-11-28 22:27:54,994] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step79000 is begin to save! 0: [2022-11-28 22:27:55,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:27:55,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:27:55,103] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:27:55,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:27:55,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:27:55,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:27:55,148] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:27:55,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:27:55,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:27:55,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:27:55,196] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:27:55,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:27:55,219] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:27:55,241] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:27:55,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:27:55,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:27:55,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:27:55,288] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:27:55,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:27:55,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:27:55,312] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:27:55,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:27:55,336] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:27:55,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:27:55,359] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:27:55,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:27:55,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:27:55,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:27:55,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:27:55,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:27:55,427] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:27:55,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:27:55,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:27:55,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:27:55,472] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:27:55,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:27:55,495] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:27:55,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:27:55,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:27:55,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:27:55,523] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step79000/mp_rank_00_model_states.pt 0: [2022-11-28 22:27:55,523] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:27:55,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:27:55,544] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:27:55,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:27:55,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:27:55,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:27:55,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 7: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2022-11-28 22:27:55,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2022-11-28 22:27:55,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2022-11-28 22:27:55,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,597] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,597] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 3: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:27:55,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 22:27:55,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 5: [2022-11-28 22:27:55,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,608] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,608] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,608] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 2: [2022-11-28 22:27:55,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:27:55,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:27:55,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 6: [2022-11-28 22:27:55,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:27:55,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:27:55,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 1: [2022-11-28 22:27:55,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:27:55,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:27:55,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,614] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,614] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,614] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:27:55,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,615] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,615] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 4: [2022-11-28 22:27:55,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: [2022-11-28 22:27:55,652] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step79000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:27:55,652] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step79000 is ready now! 0: successfully saved checkpoint at iteration 79000 to checkpoints_221m 7: time (ms) | save-checkpoint: 685.48 7: iteration 79010/ 115203 | consumed samples: 20226560 | consumed tokens: 41423994880 | elapsed time per iteration (s): 0.51 | learning rate: 6.114E-05 | global batch size: 256 | lm loss: 2.247010E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 504.867 | TFLOPs: 26.49 | 7: iteration 79020/ 115203 | consumed samples: 20229120 | consumed tokens: 41429237760 | elapsed time per iteration (s): 0.43 | learning rate: 6.112E-05 | global batch size: 256 | lm loss: 2.252176E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.322 | TFLOPs: 30.97 | 7: iteration 79030/ 115203 | consumed samples: 20231680 | consumed tokens: 41434480640 | elapsed time per iteration (s): 0.43 | learning rate: 6.110E-05 | global batch size: 256 | lm loss: 2.247441E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.718 | TFLOPs: 30.94 | 7: iteration 79040/ 115203 | consumed samples: 20234240 | consumed tokens: 41439723520 | elapsed time per iteration (s): 0.43 | learning rate: 6.108E-05 | global batch size: 256 | lm loss: 2.252128E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.493 | TFLOPs: 31.30 | 7: iteration 79050/ 115203 | consumed samples: 20236800 | consumed tokens: 41444966400 | elapsed time per iteration (s): 0.43 | learning rate: 6.106E-05 | global batch size: 256 | lm loss: 2.278197E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.518 | TFLOPs: 31.25 | 7: iteration 79060/ 115203 | consumed samples: 20239360 | consumed tokens: 41450209280 | elapsed time per iteration (s): 0.43 | learning rate: 6.104E-05 | global batch size: 256 | lm loss: 2.240240E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.277 | TFLOPs: 31.34 | 7: iteration 79070/ 115203 | consumed samples: 20241920 | consumed tokens: 41455452160 | elapsed time per iteration (s): 0.44 | learning rate: 6.102E-05 | global batch size: 256 | lm loss: 2.215188E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.164 | TFLOPs: 30.76 | 7: iteration 79080/ 115203 | consumed samples: 20244480 | consumed tokens: 41460695040 | elapsed time per iteration (s): 0.43 | learning rate: 6.100E-05 | global batch size: 256 | lm loss: 2.243586E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.270 | TFLOPs: 31.29 | 7: iteration 79090/ 115203 | consumed samples: 20247040 | consumed tokens: 41465937920 | elapsed time per iteration (s): 0.43 | learning rate: 6.098E-05 | global batch size: 256 | lm loss: 2.247662E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.573 | TFLOPs: 31.46 | 7: iteration 79100/ 115203 | consumed samples: 20249600 | consumed tokens: 41471180800 | elapsed time per iteration (s): 0.45 | learning rate: 6.096E-05 | global batch size: 256 | lm loss: 2.235112E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.442 | TFLOPs: 30.14 | 7: iteration 79110/ 115203 | consumed samples: 20252160 | consumed tokens: 41476423680 | elapsed time per iteration (s): 0.42 | learning rate: 6.094E-05 | global batch size: 256 | lm loss: 2.277044E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.744 | TFLOPs: 31.78 | 7: iteration 79120/ 115203 | consumed samples: 20254720 | consumed tokens: 41481666560 | elapsed time per iteration (s): 0.45 | learning rate: 6.091E-05 | global batch size: 256 | lm loss: 2.240716E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.473 | TFLOPs: 30.09 | 7: iteration 79130/ 115203 | consumed samples: 20257280 | consumed tokens: 41486909440 | elapsed time per iteration (s): 0.43 | learning rate: 6.089E-05 | global batch size: 256 | lm loss: 2.262349E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.797 | TFLOPs: 31.16 | 7: iteration 79140/ 115203 | consumed samples: 20259840 | consumed tokens: 41492152320 | elapsed time per iteration (s): 0.43 | learning rate: 6.087E-05 | global batch size: 256 | lm loss: 2.229480E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.808 | TFLOPs: 31.31 | 7: iteration 79150/ 115203 | consumed samples: 20262400 | consumed tokens: 41497395200 | elapsed time per iteration (s): 0.43 | learning rate: 6.085E-05 | global batch size: 256 | lm loss: 2.271218E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.088 | TFLOPs: 31.59 | 7: iteration 79160/ 115203 | consumed samples: 20264960 | consumed tokens: 41502638080 | elapsed time per iteration (s): 0.42 | learning rate: 6.083E-05 | global batch size: 256 | lm loss: 2.249565E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.472 | TFLOPs: 31.61 | 7: iteration 79170/ 115203 | consumed samples: 20267520 | consumed tokens: 41507880960 | elapsed time per iteration (s): 0.43 | learning rate: 6.081E-05 | global batch size: 256 | lm loss: 2.220609E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.968 | TFLOPs: 31.11 | 7: iteration 79180/ 115203 | consumed samples: 20270080 | consumed tokens: 41513123840 | elapsed time per iteration (s): 0.43 | learning rate: 6.079E-05 | global batch size: 256 | lm loss: 2.245223E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.353 | TFLOPs: 31.08 | 7: iteration 79190/ 115203 | consumed samples: 20272640 | consumed tokens: 41518366720 | elapsed time per iteration (s): 0.43 | learning rate: 6.077E-05 | global batch size: 256 | lm loss: 2.274119E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.207 | TFLOPs: 31.54 | 7: iteration 79200/ 115203 | consumed samples: 20275200 | consumed tokens: 41523609600 | elapsed time per iteration (s): 0.44 | learning rate: 6.075E-05 | global batch size: 256 | lm loss: 2.240127E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.397 | TFLOPs: 30.87 | 7: iteration 79210/ 115203 | consumed samples: 20277760 | consumed tokens: 41528852480 | elapsed time per iteration (s): 0.43 | learning rate: 6.073E-05 | global batch size: 256 | lm loss: 2.246854E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.280 | TFLOPs: 31.34 | 7: iteration 79220/ 115203 | consumed samples: 20280320 | consumed tokens: 41534095360 | elapsed time per iteration (s): 0.43 | learning rate: 6.071E-05 | global batch size: 256 | lm loss: 2.263307E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.277 | TFLOPs: 31.60 | 7: iteration 79230/ 115203 | consumed samples: 20282880 | consumed tokens: 41539338240 | elapsed time per iteration (s): 0.42 | learning rate: 6.069E-05 | global batch size: 256 | lm loss: 2.260140E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.200 | TFLOPs: 31.65 | 7: iteration 79240/ 115203 | consumed samples: 20285440 | consumed tokens: 41544581120 | elapsed time per iteration (s): 0.43 | learning rate: 6.067E-05 | global batch size: 256 | lm loss: 2.219063E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.011 | TFLOPs: 31.17 | 7: iteration 79250/ 115203 | consumed samples: 20288000 | consumed tokens: 41549824000 | elapsed time per iteration (s): 0.43 | learning rate: 6.065E-05 | global batch size: 256 | lm loss: 2.244978E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.859 | TFLOPs: 31.42 | 7: iteration 79260/ 115203 | consumed samples: 20290560 | consumed tokens: 41555066880 | elapsed time per iteration (s): 0.43 | learning rate: 6.062E-05 | global batch size: 256 | lm loss: 2.248431E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.270 | TFLOPs: 31.39 | 7: iteration 79270/ 115203 | consumed samples: 20293120 | consumed tokens: 41560309760 | elapsed time per iteration (s): 0.43 | learning rate: 6.060E-05 | global batch size: 256 | lm loss: 2.249489E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.795 | TFLOPs: 30.95 | 7: iteration 79280/ 115203 | consumed samples: 20295680 | consumed tokens: 41565552640 | elapsed time per iteration (s): 0.45 | learning rate: 6.058E-05 | global batch size: 256 | lm loss: 2.263704E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.067 | TFLOPs: 29.91 | 7: iteration 79290/ 115203 | consumed samples: 20298240 | consumed tokens: 41570795520 | elapsed time per iteration (s): 0.43 | learning rate: 6.056E-05 | global batch size: 256 | lm loss: 2.280367E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.438 | TFLOPs: 31.40 | 7: iteration 79300/ 115203 | consumed samples: 20300800 | consumed tokens: 41576038400 | elapsed time per iteration (s): 0.43 | learning rate: 6.054E-05 | global batch size: 256 | lm loss: 2.243537E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.906 | TFLOPs: 31.37 | 7: iteration 79310/ 115203 | consumed samples: 20303360 | consumed tokens: 41581281280 | elapsed time per iteration (s): 0.43 | learning rate: 6.052E-05 | global batch size: 256 | lm loss: 2.202932E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.066 | TFLOPs: 30.96 | 7: iteration 79320/ 115203 | consumed samples: 20305920 | consumed tokens: 41586524160 | elapsed time per iteration (s): 0.42 | learning rate: 6.050E-05 | global batch size: 256 | lm loss: 2.239367E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.361 | TFLOPs: 31.60 | 7: iteration 79330/ 115203 | consumed samples: 20308480 | consumed tokens: 41591767040 | elapsed time per iteration (s): 0.43 | learning rate: 6.048E-05 | global batch size: 256 | lm loss: 2.239189E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.512 | TFLOPs: 31.35 | 7: iteration 79340/ 115203 | consumed samples: 20311040 | consumed tokens: 41597009920 | elapsed time per iteration (s): 0.44 | learning rate: 6.046E-05 | global batch size: 256 | lm loss: 2.271370E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.257 | TFLOPs: 30.34 | 7: iteration 79350/ 115203 | consumed samples: 20313600 | consumed tokens: 41602252800 | elapsed time per iteration (s): 0.43 | learning rate: 6.044E-05 | global batch size: 256 | lm loss: 2.248528E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.608 | TFLOPs: 30.94 | 7: iteration 79360/ 115203 | consumed samples: 20316160 | consumed tokens: 41607495680 | elapsed time per iteration (s): 0.43 | learning rate: 6.042E-05 | global batch size: 256 | lm loss: 2.268344E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.995 | TFLOPs: 31.38 | 7: iteration 79370/ 115203 | consumed samples: 20318720 | consumed tokens: 41612738560 | elapsed time per iteration (s): 0.43 | learning rate: 6.040E-05 | global batch size: 256 | lm loss: 2.233646E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.750 | TFLOPs: 31.26 | 7: iteration 79380/ 115203 | consumed samples: 20321280 | consumed tokens: 41617981440 | elapsed time per iteration (s): 0.44 | learning rate: 6.038E-05 | global batch size: 256 | lm loss: 2.249746E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.829 | TFLOPs: 30.79 | 7: iteration 79390/ 115203 | consumed samples: 20323840 | consumed tokens: 41623224320 | elapsed time per iteration (s): 0.43 | learning rate: 6.036E-05 | global batch size: 256 | lm loss: 2.278432E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.086 | TFLOPs: 31.38 | 7: iteration 79400/ 115203 | consumed samples: 20326400 | consumed tokens: 41628467200 | elapsed time per iteration (s): 0.43 | learning rate: 6.033E-05 | global batch size: 256 | lm loss: 2.274397E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.802 | TFLOPs: 30.95 | 7: iteration 79410/ 115203 | consumed samples: 20328960 | consumed tokens: 41633710080 | elapsed time per iteration (s): 0.43 | learning rate: 6.031E-05 | global batch size: 256 | lm loss: 2.258991E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.788 | TFLOPs: 31.31 | 7: iteration 79420/ 115203 | consumed samples: 20331520 | consumed tokens: 41638952960 | elapsed time per iteration (s): 0.45 | learning rate: 6.029E-05 | global batch size: 256 | lm loss: 2.257728E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.915 | TFLOPs: 30.11 | 7: iteration 79430/ 115203 | consumed samples: 20334080 | consumed tokens: 41644195840 | elapsed time per iteration (s): 0.42 | learning rate: 6.027E-05 | global batch size: 256 | lm loss: 2.256574E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.997 | TFLOPs: 32.06 | 7: iteration 79440/ 115203 | consumed samples: 20336640 | consumed tokens: 41649438720 | elapsed time per iteration (s): 0.42 | learning rate: 6.025E-05 | global batch size: 256 | lm loss: 2.249656E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.694 | TFLOPs: 32.15 | 7: iteration 79450/ 115203 | consumed samples: 20339200 | consumed tokens: 41654681600 | elapsed time per iteration (s): 0.45 | learning rate: 6.023E-05 | global batch size: 256 | lm loss: 2.242023E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.300 | TFLOPs: 29.82 | 7: iteration 79460/ 115203 | consumed samples: 20341760 | consumed tokens: 41659924480 | elapsed time per iteration (s): 0.43 | learning rate: 6.021E-05 | global batch size: 256 | lm loss: 2.236699E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.591 | TFLOPs: 31.09 | 7: iteration 79470/ 115203 | consumed samples: 20344320 | consumed tokens: 41665167360 | elapsed time per iteration (s): 0.44 | learning rate: 6.019E-05 | global batch size: 256 | lm loss: 2.275453E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.440 | TFLOPs: 30.61 | 7: iteration 79480/ 115203 | consumed samples: 20346880 | consumed tokens: 41670410240 | elapsed time per iteration (s): 0.44 | learning rate: 6.017E-05 | global batch size: 256 | lm loss: 2.216400E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.338 | TFLOPs: 30.76 | 7: iteration 79490/ 115203 | consumed samples: 20349440 | consumed tokens: 41675653120 | elapsed time per iteration (s): 0.44 | learning rate: 6.015E-05 | global batch size: 256 | lm loss: 2.239592E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.000 | TFLOPs: 30.80 | 7: iteration 79500/ 115203 | consumed samples: 20352000 | consumed tokens: 41680896000 | elapsed time per iteration (s): 0.43 | learning rate: 6.013E-05 | global batch size: 256 | lm loss: 2.239599E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.990 | TFLOPs: 31.59 | 7: iteration 79510/ 115203 | consumed samples: 20354560 | consumed tokens: 41686138880 | elapsed time per iteration (s): 0.42 | learning rate: 6.011E-05 | global batch size: 256 | lm loss: 2.206194E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.493 | TFLOPs: 31.87 | 7: iteration 79520/ 115203 | consumed samples: 20357120 | consumed tokens: 41691381760 | elapsed time per iteration (s): 0.42 | learning rate: 6.009E-05 | global batch size: 256 | lm loss: 2.256581E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.793 | TFLOPs: 31.68 | 7: iteration 79530/ 115203 | consumed samples: 20359680 | consumed tokens: 41696624640 | elapsed time per iteration (s): 0.42 | learning rate: 6.007E-05 | global batch size: 256 | lm loss: 2.213357E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.309 | TFLOPs: 31.86 | 7: iteration 79540/ 115203 | consumed samples: 20362240 | consumed tokens: 41701867520 | elapsed time per iteration (s): 0.43 | learning rate: 6.005E-05 | global batch size: 256 | lm loss: 2.246196E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.593 | TFLOPs: 31.56 | 7: iteration 79550/ 115203 | consumed samples: 20364800 | consumed tokens: 41707110400 | elapsed time per iteration (s): 0.42 | learning rate: 6.002E-05 | global batch size: 256 | lm loss: 2.250877E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.634 | TFLOPs: 32.20 | 7: iteration 79560/ 115203 | consumed samples: 20367360 | consumed tokens: 41712353280 | elapsed time per iteration (s): 0.43 | learning rate: 6.000E-05 | global batch size: 256 | lm loss: 2.249418E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.057 | TFLOPs: 31.27 | 7: iteration 79570/ 115203 | consumed samples: 20369920 | consumed tokens: 41717596160 | elapsed time per iteration (s): 0.42 | learning rate: 5.998E-05 | global batch size: 256 | lm loss: 2.245264E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.503 | TFLOPs: 31.93 | 7: iteration 79580/ 115203 | consumed samples: 20372480 | consumed tokens: 41722839040 | elapsed time per iteration (s): 0.43 | learning rate: 5.996E-05 | global batch size: 256 | lm loss: 2.271881E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.947 | TFLOPs: 31.58 | 7: iteration 79590/ 115203 | consumed samples: 20375040 | consumed tokens: 41728081920 | elapsed time per iteration (s): 0.43 | learning rate: 5.994E-05 | global batch size: 256 | lm loss: 2.266952E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.826 | TFLOPs: 31.31 | 7: iteration 79600/ 115203 | consumed samples: 20377600 | consumed tokens: 41733324800 | elapsed time per iteration (s): 0.43 | learning rate: 5.992E-05 | global batch size: 256 | lm loss: 2.248325E+00 | grad norm: 0.213 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.984 | TFLOPs: 31.59 | 7: iteration 79610/ 115203 | consumed samples: 20380160 | consumed tokens: 41738567680 | elapsed time per iteration (s): 0.43 | learning rate: 5.990E-05 | global batch size: 256 | lm loss: 2.268612E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.908 | TFLOPs: 31.37 | 7: iteration 79620/ 115203 | consumed samples: 20382720 | consumed tokens: 41743810560 | elapsed time per iteration (s): 0.43 | learning rate: 5.988E-05 | global batch size: 256 | lm loss: 2.265554E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.934 | TFLOPs: 31.01 | 7: iteration 79630/ 115203 | consumed samples: 20385280 | consumed tokens: 41749053440 | elapsed time per iteration (s): 0.44 | learning rate: 5.986E-05 | global batch size: 256 | lm loss: 2.231741E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.457 | TFLOPs: 30.30 | 7: iteration 79640/ 115203 | consumed samples: 20387840 | consumed tokens: 41754296320 | elapsed time per iteration (s): 0.43 | learning rate: 5.984E-05 | global batch size: 256 | lm loss: 2.279859E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.457 | TFLOPs: 31.30 | 7: iteration 79650/ 115203 | consumed samples: 20390400 | consumed tokens: 41759539200 | elapsed time per iteration (s): 0.43 | learning rate: 5.982E-05 | global batch size: 256 | lm loss: 2.245786E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | 7: iteration 79660/ 115203 | consumed samples: 20392960 | consumed tokens: 41764782080 | elapsed time per iteration (s): 0.43 | learning rate: 5.980E-05 | global batch size: 256 | lm loss: 2.251763E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.630 | TFLOPs: 31.30 | 7: iteration 79670/ 115203 | consumed samples: 20395520 | consumed tokens: 41770024960 | elapsed time per iteration (s): 0.42 | learning rate: 5.978E-05 | global batch size: 256 | lm loss: 2.258603E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.096 | TFLOPs: 31.80 | 7: iteration 79680/ 115203 | consumed samples: 20398080 | consumed tokens: 41775267840 | elapsed time per iteration (s): 0.43 | learning rate: 5.976E-05 | global batch size: 256 | lm loss: 2.216195E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.986 | TFLOPs: 31.22 | 7: iteration 79690/ 115203 | consumed samples: 20400640 | consumed tokens: 41780510720 | elapsed time per iteration (s): 0.43 | learning rate: 5.974E-05 | global batch size: 256 | lm loss: 2.250605E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.369 | TFLOPs: 31.55 | 7: iteration 79700/ 115203 | consumed samples: 20403200 | consumed tokens: 41785753600 | elapsed time per iteration (s): 0.43 | learning rate: 5.972E-05 | global batch size: 256 | lm loss: 2.252547E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.987 | TFLOPs: 31.06 | 7: iteration 79710/ 115203 | consumed samples: 20405760 | consumed tokens: 41790996480 | elapsed time per iteration (s): 0.43 | learning rate: 5.970E-05 | global batch size: 256 | lm loss: 2.260562E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.480 | TFLOPs: 31.40 | 7: iteration 79720/ 115203 | consumed samples: 20408320 | consumed tokens: 41796239360 | elapsed time per iteration (s): 0.43 | learning rate: 5.967E-05 | global batch size: 256 | lm loss: 2.212070E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.553 | TFLOPs: 31.25 | 7: iteration 79730/ 115203 | consumed samples: 20410880 | consumed tokens: 41801482240 | elapsed time per iteration (s): 0.42 | learning rate: 5.965E-05 | global batch size: 256 | lm loss: 2.219412E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.895 | TFLOPs: 31.63 | 7: iteration 79740/ 115203 | consumed samples: 20413440 | consumed tokens: 41806725120 | elapsed time per iteration (s): 0.42 | learning rate: 5.963E-05 | global batch size: 256 | lm loss: 2.211447E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.563 | TFLOPs: 32.04 | 7: iteration 79750/ 115203 | consumed samples: 20416000 | consumed tokens: 41811968000 | elapsed time per iteration (s): 0.43 | learning rate: 5.961E-05 | global batch size: 256 | lm loss: 2.264612E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.430 | TFLOPs: 31.50 | 7: iteration 79760/ 115203 | consumed samples: 20418560 | consumed tokens: 41817210880 | elapsed time per iteration (s): 0.42 | learning rate: 5.959E-05 | global batch size: 256 | lm loss: 2.269818E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.056 | TFLOPs: 31.90 | 7: iteration 79770/ 115203 | consumed samples: 20421120 | consumed tokens: 41822453760 | elapsed time per iteration (s): 0.42 | learning rate: 5.957E-05 | global batch size: 256 | lm loss: 2.253098E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.892 | TFLOPs: 31.69 | 7: iteration 79780/ 115203 | consumed samples: 20423680 | consumed tokens: 41827696640 | elapsed time per iteration (s): 0.42 | learning rate: 5.955E-05 | global batch size: 256 | lm loss: 2.245624E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.421 | TFLOPs: 31.82 | 7: iteration 79790/ 115203 | consumed samples: 20426240 | consumed tokens: 41832939520 | elapsed time per iteration (s): 0.42 | learning rate: 5.953E-05 | global batch size: 256 | lm loss: 2.233887E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.132 | TFLOPs: 31.75 | 7: iteration 79800/ 115203 | consumed samples: 20428800 | consumed tokens: 41838182400 | elapsed time per iteration (s): 0.45 | learning rate: 5.951E-05 | global batch size: 256 | lm loss: 2.245027E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.049 | TFLOPs: 30.07 | 7: iteration 79810/ 115203 | consumed samples: 20431360 | consumed tokens: 41843425280 | elapsed time per iteration (s): 0.42 | learning rate: 5.949E-05 | global batch size: 256 | lm loss: 2.239980E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.202 | TFLOPs: 31.91 | 7: iteration 79820/ 115203 | consumed samples: 20433920 | consumed tokens: 41848668160 | elapsed time per iteration (s): 0.44 | learning rate: 5.947E-05 | global batch size: 256 | lm loss: 2.259771E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.028 | TFLOPs: 30.33 | 7: iteration 79830/ 115203 | consumed samples: 20436480 | consumed tokens: 41853911040 | elapsed time per iteration (s): 0.44 | learning rate: 5.945E-05 | global batch size: 256 | lm loss: 2.248808E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.929 | TFLOPs: 30.80 | 7: iteration 79840/ 115203 | consumed samples: 20439040 | consumed tokens: 41859153920 | elapsed time per iteration (s): 0.44 | learning rate: 5.943E-05 | global batch size: 256 | lm loss: 2.228100E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.965 | TFLOPs: 30.85 | 7: iteration 79850/ 115203 | consumed samples: 20441600 | consumed tokens: 41864396800 | elapsed time per iteration (s): 0.42 | learning rate: 5.941E-05 | global batch size: 256 | lm loss: 2.207592E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.162 | TFLOPs: 31.96 | 7: iteration 79860/ 115203 | consumed samples: 20444160 | consumed tokens: 41869639680 | elapsed time per iteration (s): 0.43 | learning rate: 5.939E-05 | global batch size: 256 | lm loss: 2.204845E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.106 | TFLOPs: 31.49 | 7: iteration 79870/ 115203 | consumed samples: 20446720 | consumed tokens: 41874882560 | elapsed time per iteration (s): 0.43 | learning rate: 5.937E-05 | global batch size: 256 | lm loss: 2.244612E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.705 | TFLOPs: 31.57 | 7: iteration 79880/ 115203 | consumed samples: 20449280 | consumed tokens: 41880125440 | elapsed time per iteration (s): 0.43 | learning rate: 5.935E-05 | global batch size: 256 | lm loss: 2.231810E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.791 | TFLOPs: 31.42 | 7: iteration 79890/ 115203 | consumed samples: 20451840 | consumed tokens: 41885368320 | elapsed time per iteration (s): 0.42 | learning rate: 5.933E-05 | global batch size: 256 | lm loss: 2.293298E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.235 | TFLOPs: 31.76 | 7: iteration 79900/ 115203 | consumed samples: 20454400 | consumed tokens: 41890611200 | elapsed time per iteration (s): 0.45 | learning rate: 5.931E-05 | global batch size: 256 | lm loss: 2.224409E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.088 | TFLOPs: 29.60 | 7: iteration 79910/ 115203 | consumed samples: 20456960 | consumed tokens: 41895854080 | elapsed time per iteration (s): 0.42 | learning rate: 5.929E-05 | global batch size: 256 | lm loss: 2.215607E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.674 | TFLOPs: 31.78 | 7: iteration 79920/ 115203 | consumed samples: 20459520 | consumed tokens: 41901096960 | elapsed time per iteration (s): 0.42 | learning rate: 5.926E-05 | global batch size: 256 | lm loss: 2.258357E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.442 | TFLOPs: 31.98 | 7: iteration 79930/ 115203 | consumed samples: 20462080 | consumed tokens: 41906339840 | elapsed time per iteration (s): 0.42 | learning rate: 5.924E-05 | global batch size: 256 | lm loss: 2.231351E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.398 | TFLOPs: 31.71 | 7: iteration 79940/ 115203 | consumed samples: 20464640 | consumed tokens: 41911582720 | elapsed time per iteration (s): 0.66 | learning rate: 5.922E-05 | global batch size: 256 | lm loss: 2.217685E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 389.805 | TFLOPs: 20.45 | 7: iteration 79950/ 115203 | consumed samples: 20467200 | consumed tokens: 41916825600 | elapsed time per iteration (s): 0.44 | learning rate: 5.920E-05 | global batch size: 256 | lm loss: 2.238212E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.056 | TFLOPs: 30.75 | 7: iteration 79960/ 115203 | consumed samples: 20469760 | consumed tokens: 41922068480 | elapsed time per iteration (s): 0.44 | learning rate: 5.918E-05 | global batch size: 256 | lm loss: 2.231489E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.762 | TFLOPs: 30.79 | 7: iteration 79970/ 115203 | consumed samples: 20472320 | consumed tokens: 41927311360 | elapsed time per iteration (s): 0.45 | learning rate: 5.916E-05 | global batch size: 256 | lm loss: 2.227612E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.465 | TFLOPs: 30.14 | 7: iteration 79980/ 115203 | consumed samples: 20474880 | consumed tokens: 41932554240 | elapsed time per iteration (s): 0.43 | learning rate: 5.914E-05 | global batch size: 256 | lm loss: 2.250862E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.666 | TFLOPs: 31.36 | 7: iteration 79990/ 115203 | consumed samples: 20477440 | consumed tokens: 41937797120 | elapsed time per iteration (s): 0.44 | learning rate: 5.912E-05 | global batch size: 256 | lm loss: 2.265427E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.178 | TFLOPs: 30.65 | 0: [2022-11-28 22:35:07,895] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=0, lr=[5.910086097100006e-05, 5.910086097100006e-05, 5.910086097100006e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 80000/ 115203 | consumed samples: 20480000 | consumed tokens: 41943040000 | elapsed time per iteration (s): 0.42 | learning rate: 5.910E-05 | global batch size: 256 | lm loss: 2.253848E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.201 | TFLOPs: 31.70 | 0: steps: 80000 loss: 2.2286 iter time (s): 0.429 samples/sec: 597.084 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 80000 | lm loss value: 2.168320E+00 | lm loss PPL: 8.743580E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 80000 to checkpoints_221m 0: [2022-11-28 22:35:08,064] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step80000 is begin to save! 0: [2022-11-28 22:35:08,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:35:08,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:35:08,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:35:08,191] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:35:08,191] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:35:08,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:35:08,214] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:35:08,238] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:35:08,238] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:35:08,260] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:35:08,260] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:35:08,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:35:08,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:35:08,306] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:35:08,306] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:35:08,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:35:08,329] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:35:08,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:35:08,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:35:08,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:35:08,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:35:08,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:35:08,396] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:35:08,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:35:08,419] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:35:08,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:35:08,443] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:35:08,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:35:08,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:35:08,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:35:08,489] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:35:08,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:35:08,513] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:35:08,535] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:35:08,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:35:08,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:35:08,558] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:35:08,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:35:08,581] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:35:08,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:35:08,586] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step80000/mp_rank_00_model_states.pt 0: [2022-11-28 22:35:08,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:35:08,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:35:08,609] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:35:08,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,660] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 1: [2022-11-28 22:35:08,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:35:08,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:35:08,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 2: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 4: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:35:08,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:35:08,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 6: [2022-11-28 22:35:08,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,659] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,659] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,661] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,662] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,662] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 22:35:08,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:35:08,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 3: [2022-11-28 22:35:08,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:35:08,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:35:08,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,694] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,694] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,696] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,696] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 7: [2022-11-28 22:35:08,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:35:08,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:35:08,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:35:08,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,706] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,706] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: [2022-11-28 22:35:08,723] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step80000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:35:08,723] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step80000 is ready now! 0: successfully saved checkpoint at iteration 80000 to checkpoints_221m 7: time (ms) | save-checkpoint: 672.52 7: iteration 80010/ 115203 | consumed samples: 20482560 | consumed tokens: 41948282880 | elapsed time per iteration (s): 0.52 | learning rate: 5.908E-05 | global batch size: 256 | lm loss: 2.245064E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 495.196 | TFLOPs: 25.98 | 7: iteration 80020/ 115203 | consumed samples: 20485120 | consumed tokens: 41953525760 | elapsed time per iteration (s): 0.43 | learning rate: 5.906E-05 | global batch size: 256 | lm loss: 2.229844E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.049 | TFLOPs: 31.17 | 7: iteration 80030/ 115203 | consumed samples: 20487680 | consumed tokens: 41958768640 | elapsed time per iteration (s): 0.43 | learning rate: 5.904E-05 | global batch size: 256 | lm loss: 2.255506E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.127 | TFLOPs: 31.33 | 7: iteration 80040/ 115203 | consumed samples: 20490240 | consumed tokens: 41964011520 | elapsed time per iteration (s): 0.42 | learning rate: 5.902E-05 | global batch size: 256 | lm loss: 2.257837E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.902 | TFLOPs: 32.00 | 7: iteration 80050/ 115203 | consumed samples: 20492800 | consumed tokens: 41969254400 | elapsed time per iteration (s): 0.43 | learning rate: 5.900E-05 | global batch size: 256 | lm loss: 2.250489E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.849 | TFLOPs: 31.00 | 7: iteration 80060/ 115203 | consumed samples: 20495360 | consumed tokens: 41974497280 | elapsed time per iteration (s): 0.42 | learning rate: 5.898E-05 | global batch size: 256 | lm loss: 2.232913E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.750 | TFLOPs: 31.78 | 7: iteration 80070/ 115203 | consumed samples: 20497920 | consumed tokens: 41979740160 | elapsed time per iteration (s): 0.43 | learning rate: 5.896E-05 | global batch size: 256 | lm loss: 2.245975E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.367 | TFLOPs: 31.29 | 7: iteration 80080/ 115203 | consumed samples: 20500480 | consumed tokens: 41984983040 | elapsed time per iteration (s): 0.44 | learning rate: 5.894E-05 | global batch size: 256 | lm loss: 2.250072E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.421 | TFLOPs: 30.45 | 7: iteration 80090/ 115203 | consumed samples: 20503040 | consumed tokens: 41990225920 | elapsed time per iteration (s): 0.42 | learning rate: 5.892E-05 | global batch size: 256 | lm loss: 2.241493E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.202 | TFLOPs: 31.81 | 7: iteration 80100/ 115203 | consumed samples: 20505600 | consumed tokens: 41995468800 | elapsed time per iteration (s): 0.42 | learning rate: 5.890E-05 | global batch size: 256 | lm loss: 2.242912E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.950 | TFLOPs: 31.69 | 7: iteration 80110/ 115203 | consumed samples: 20508160 | consumed tokens: 42000711680 | elapsed time per iteration (s): 0.42 | learning rate: 5.888E-05 | global batch size: 256 | lm loss: 2.246077E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.756 | TFLOPs: 31.73 | 7: iteration 80120/ 115203 | consumed samples: 20510720 | consumed tokens: 42005954560 | elapsed time per iteration (s): 0.43 | learning rate: 5.886E-05 | global batch size: 256 | lm loss: 2.255584E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.353 | TFLOPs: 31.45 | 7: iteration 80130/ 115203 | consumed samples: 20513280 | consumed tokens: 42011197440 | elapsed time per iteration (s): 0.44 | learning rate: 5.884E-05 | global batch size: 256 | lm loss: 2.265287E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.792 | TFLOPs: 30.53 | 7: iteration 80140/ 115203 | consumed samples: 20515840 | consumed tokens: 42016440320 | elapsed time per iteration (s): 0.64 | learning rate: 5.882E-05 | global batch size: 256 | lm loss: 2.217859E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 403.148 | TFLOPs: 21.15 | 7: iteration 80150/ 115203 | consumed samples: 20518400 | consumed tokens: 42021683200 | elapsed time per iteration (s): 0.43 | learning rate: 5.879E-05 | global batch size: 256 | lm loss: 2.227504E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.523 | TFLOPs: 31.35 | 7: iteration 80160/ 115203 | consumed samples: 20520960 | consumed tokens: 42026926080 | elapsed time per iteration (s): 0.44 | learning rate: 5.877E-05 | global batch size: 256 | lm loss: 2.251341E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.837 | TFLOPs: 30.53 | 7: iteration 80170/ 115203 | consumed samples: 20523520 | consumed tokens: 42032168960 | elapsed time per iteration (s): 0.43 | learning rate: 5.875E-05 | global batch size: 256 | lm loss: 2.261466E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.500 | TFLOPs: 31.51 | 7: iteration 80180/ 115203 | consumed samples: 20526080 | consumed tokens: 42037411840 | elapsed time per iteration (s): 0.44 | learning rate: 5.873E-05 | global batch size: 256 | lm loss: 2.229116E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.517 | TFLOPs: 30.46 | 7: iteration 80190/ 115203 | consumed samples: 20528640 | consumed tokens: 42042654720 | elapsed time per iteration (s): 0.42 | learning rate: 5.871E-05 | global batch size: 256 | lm loss: 2.220853E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.338 | TFLOPs: 31.81 | 7: iteration 80200/ 115203 | consumed samples: 20531200 | consumed tokens: 42047897600 | elapsed time per iteration (s): 0.44 | learning rate: 5.869E-05 | global batch size: 256 | lm loss: 2.250777E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.506 | TFLOPs: 30.77 | 7: iteration 80210/ 115203 | consumed samples: 20533760 | consumed tokens: 42053140480 | elapsed time per iteration (s): 0.43 | learning rate: 5.867E-05 | global batch size: 256 | lm loss: 2.228188E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.071 | TFLOPs: 31.12 | 7: iteration 80220/ 115203 | consumed samples: 20536320 | consumed tokens: 42058383360 | elapsed time per iteration (s): 0.42 | learning rate: 5.865E-05 | global batch size: 256 | lm loss: 2.246569E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.721 | TFLOPs: 31.94 | 7: iteration 80230/ 115203 | consumed samples: 20538880 | consumed tokens: 42063626240 | elapsed time per iteration (s): 0.43 | learning rate: 5.863E-05 | global batch size: 256 | lm loss: 2.241296E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.135 | TFLOPs: 31.33 | 7: iteration 80240/ 115203 | consumed samples: 20541440 | consumed tokens: 42068869120 | elapsed time per iteration (s): 0.43 | learning rate: 5.861E-05 | global batch size: 256 | lm loss: 2.215023E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.499 | TFLOPs: 31.14 | 7: iteration 80250/ 115203 | consumed samples: 20544000 | consumed tokens: 42074112000 | elapsed time per iteration (s): 0.43 | learning rate: 5.859E-05 | global batch size: 256 | lm loss: 2.248097E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.698 | TFLOPs: 31.26 | 7: iteration 80260/ 115203 | consumed samples: 20546560 | consumed tokens: 42079354880 | elapsed time per iteration (s): 0.42 | learning rate: 5.857E-05 | global batch size: 256 | lm loss: 2.263257E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.574 | TFLOPs: 31.83 | 7: iteration 80270/ 115203 | consumed samples: 20549120 | consumed tokens: 42084597760 | elapsed time per iteration (s): 0.44 | learning rate: 5.855E-05 | global batch size: 256 | lm loss: 2.229653E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.114 | TFLOPs: 30.54 | 7: iteration 80280/ 115203 | consumed samples: 20551680 | consumed tokens: 42089840640 | elapsed time per iteration (s): 0.42 | learning rate: 5.853E-05 | global batch size: 256 | lm loss: 2.247013E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.377 | TFLOPs: 31.82 | 7: iteration 80290/ 115203 | consumed samples: 20554240 | consumed tokens: 42095083520 | elapsed time per iteration (s): 0.42 | learning rate: 5.851E-05 | global batch size: 256 | lm loss: 2.241425E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.748 | TFLOPs: 31.63 | 7: iteration 80300/ 115203 | consumed samples: 20556800 | consumed tokens: 42100326400 | elapsed time per iteration (s): 0.43 | learning rate: 5.849E-05 | global batch size: 256 | lm loss: 2.271792E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.218 | TFLOPs: 31.39 | 7: iteration 80310/ 115203 | consumed samples: 20559360 | consumed tokens: 42105569280 | elapsed time per iteration (s): 0.44 | learning rate: 5.847E-05 | global batch size: 256 | lm loss: 2.234935E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.863 | TFLOPs: 30.74 | 7: iteration 80320/ 115203 | consumed samples: 20561920 | consumed tokens: 42110812160 | elapsed time per iteration (s): 0.42 | learning rate: 5.845E-05 | global batch size: 256 | lm loss: 2.236198E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.611 | TFLOPs: 31.88 | 7: iteration 80330/ 115203 | consumed samples: 20564480 | consumed tokens: 42116055040 | elapsed time per iteration (s): 0.44 | learning rate: 5.843E-05 | global batch size: 256 | lm loss: 2.215864E+00 | grad norm: 0.218 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.912 | TFLOPs: 30.53 | 7: iteration 80340/ 115203 | consumed samples: 20567040 | consumed tokens: 42121297920 | elapsed time per iteration (s): 0.42 | learning rate: 5.841E-05 | global batch size: 256 | lm loss: 2.245916E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.091 | TFLOPs: 31.85 | 7: iteration 80350/ 115203 | consumed samples: 20569600 | consumed tokens: 42126540800 | elapsed time per iteration (s): 0.43 | learning rate: 5.839E-05 | global batch size: 256 | lm loss: 2.257552E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.642 | TFLOPs: 31.41 | 7: iteration 80360/ 115203 | consumed samples: 20572160 | consumed tokens: 42131783680 | elapsed time per iteration (s): 0.42 | learning rate: 5.837E-05 | global batch size: 256 | lm loss: 2.251171E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.573 | TFLOPs: 31.83 | 7: iteration 80370/ 115203 | consumed samples: 20574720 | consumed tokens: 42137026560 | elapsed time per iteration (s): 0.45 | learning rate: 5.835E-05 | global batch size: 256 | lm loss: 2.240989E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.159 | TFLOPs: 30.13 | 7: iteration 80380/ 115203 | consumed samples: 20577280 | consumed tokens: 42142269440 | elapsed time per iteration (s): 0.43 | learning rate: 5.833E-05 | global batch size: 256 | lm loss: 2.245076E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.693 | TFLOPs: 31.46 | 7: iteration 80390/ 115203 | consumed samples: 20579840 | consumed tokens: 42147512320 | elapsed time per iteration (s): 0.43 | learning rate: 5.831E-05 | global batch size: 256 | lm loss: 2.244217E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.005 | TFLOPs: 31.01 | 7: iteration 80400/ 115203 | consumed samples: 20582400 | consumed tokens: 42152755200 | elapsed time per iteration (s): 0.43 | learning rate: 5.829E-05 | global batch size: 256 | lm loss: 2.216282E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.458 | TFLOPs: 31.35 | 7: iteration 80410/ 115203 | consumed samples: 20584960 | consumed tokens: 42157998080 | elapsed time per iteration (s): 0.43 | learning rate: 5.827E-05 | global batch size: 256 | lm loss: 2.244086E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.769 | TFLOPs: 31.31 | 7: iteration 80420/ 115203 | consumed samples: 20587520 | consumed tokens: 42163240960 | elapsed time per iteration (s): 0.43 | learning rate: 5.825E-05 | global batch size: 256 | lm loss: 2.247094E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.466 | TFLOPs: 31.24 | 7: iteration 80430/ 115203 | consumed samples: 20590080 | consumed tokens: 42168483840 | elapsed time per iteration (s): 0.43 | learning rate: 5.823E-05 | global batch size: 256 | lm loss: 2.266427E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.814 | TFLOPs: 31.42 | 7: iteration 80440/ 115203 | consumed samples: 20592640 | consumed tokens: 42173726720 | elapsed time per iteration (s): 0.43 | learning rate: 5.821E-05 | global batch size: 256 | lm loss: 2.237053E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.651 | TFLOPs: 31.52 | 7: iteration 80450/ 115203 | consumed samples: 20595200 | consumed tokens: 42178969600 | elapsed time per iteration (s): 0.42 | learning rate: 5.818E-05 | global batch size: 256 | lm loss: 2.216628E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.442 | TFLOPs: 31.66 | 7: iteration 80460/ 115203 | consumed samples: 20597760 | consumed tokens: 42184212480 | elapsed time per iteration (s): 0.43 | learning rate: 5.816E-05 | global batch size: 256 | lm loss: 2.230032E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.495 | TFLOPs: 31.14 | 7: iteration 80470/ 115203 | consumed samples: 20600320 | consumed tokens: 42189455360 | elapsed time per iteration (s): 0.43 | learning rate: 5.814E-05 | global batch size: 256 | lm loss: 2.234100E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.732 | TFLOPs: 31.05 | 7: iteration 80480/ 115203 | consumed samples: 20602880 | consumed tokens: 42194698240 | elapsed time per iteration (s): 0.43 | learning rate: 5.812E-05 | global batch size: 256 | lm loss: 2.224214E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.030 | TFLOPs: 31.06 | 7: iteration 80490/ 115203 | consumed samples: 20605440 | consumed tokens: 42199941120 | elapsed time per iteration (s): 0.44 | learning rate: 5.810E-05 | global batch size: 256 | lm loss: 2.220112E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.962 | TFLOPs: 30.27 | 7: iteration 80500/ 115203 | consumed samples: 20608000 | consumed tokens: 42205184000 | elapsed time per iteration (s): 0.43 | learning rate: 5.808E-05 | global batch size: 256 | lm loss: 2.228849E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.191 | TFLOPs: 31.33 | 7: iteration 80510/ 115203 | consumed samples: 20610560 | consumed tokens: 42210426880 | elapsed time per iteration (s): 0.45 | learning rate: 5.806E-05 | global batch size: 256 | lm loss: 2.253016E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.422 | TFLOPs: 30.09 | 7: iteration 80520/ 115203 | consumed samples: 20613120 | consumed tokens: 42215669760 | elapsed time per iteration (s): 0.43 | learning rate: 5.804E-05 | global batch size: 256 | lm loss: 2.264328E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.841 | TFLOPs: 31.42 | 7: iteration 80530/ 115203 | consumed samples: 20615680 | consumed tokens: 42220912640 | elapsed time per iteration (s): 0.43 | learning rate: 5.802E-05 | global batch size: 256 | lm loss: 2.222667E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.443 | TFLOPs: 31.08 | 7: iteration 80540/ 115203 | consumed samples: 20618240 | consumed tokens: 42226155520 | elapsed time per iteration (s): 0.99 | learning rate: 5.800E-05 | global batch size: 256 | lm loss: 2.246573E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 259.368 | TFLOPs: 13.61 | 7: iteration 80550/ 115203 | consumed samples: 20620800 | consumed tokens: 42231398400 | elapsed time per iteration (s): 0.66 | learning rate: 5.798E-05 | global batch size: 256 | lm loss: 2.261065E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 386.794 | TFLOPs: 20.29 | 7: iteration 80560/ 115203 | consumed samples: 20623360 | consumed tokens: 42236641280 | elapsed time per iteration (s): 0.88 | learning rate: 5.796E-05 | global batch size: 256 | lm loss: 2.246119E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 289.656 | TFLOPs: 15.20 | 7: iteration 80570/ 115203 | consumed samples: 20625920 | consumed tokens: 42241884160 | elapsed time per iteration (s): 0.44 | learning rate: 5.794E-05 | global batch size: 256 | lm loss: 2.269040E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.168 | TFLOPs: 30.49 | 7: iteration 80580/ 115203 | consumed samples: 20628480 | consumed tokens: 42247127040 | elapsed time per iteration (s): 0.42 | learning rate: 5.792E-05 | global batch size: 256 | lm loss: 2.257262E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.446 | TFLOPs: 31.92 | 7: iteration 80590/ 115203 | consumed samples: 20631040 | consumed tokens: 42252369920 | elapsed time per iteration (s): 0.44 | learning rate: 5.790E-05 | global batch size: 256 | lm loss: 2.261325E+00 | grad norm: 0.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.648 | TFLOPs: 30.57 | 7: iteration 80600/ 115203 | consumed samples: 20633600 | consumed tokens: 42257612800 | elapsed time per iteration (s): 0.43 | learning rate: 5.788E-05 | global batch size: 256 | lm loss: 2.258615E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.192 | TFLOPs: 31.44 | 7: iteration 80610/ 115203 | consumed samples: 20636160 | consumed tokens: 42262855680 | elapsed time per iteration (s): 0.44 | learning rate: 5.786E-05 | global batch size: 256 | lm loss: 2.231804E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.612 | TFLOPs: 30.62 | 7: iteration 80620/ 115203 | consumed samples: 20638720 | consumed tokens: 42268098560 | elapsed time per iteration (s): 0.44 | learning rate: 5.784E-05 | global batch size: 256 | lm loss: 2.243672E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.476 | TFLOPs: 30.77 | 7: iteration 80630/ 115203 | consumed samples: 20641280 | consumed tokens: 42273341440 | elapsed time per iteration (s): 0.44 | learning rate: 5.782E-05 | global batch size: 256 | lm loss: 2.245478E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.418 | TFLOPs: 30.51 | 7: iteration 80640/ 115203 | consumed samples: 20643840 | consumed tokens: 42278584320 | elapsed time per iteration (s): 0.45 | learning rate: 5.780E-05 | global batch size: 256 | lm loss: 2.254843E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.894 | TFLOPs: 29.74 | 7: iteration 80650/ 115203 | consumed samples: 20646400 | consumed tokens: 42283827200 | elapsed time per iteration (s): 0.43 | learning rate: 5.778E-05 | global batch size: 256 | lm loss: 2.255570E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.620 | TFLOPs: 31.25 | 7: iteration 80660/ 115203 | consumed samples: 20648960 | consumed tokens: 42289070080 | elapsed time per iteration (s): 0.43 | learning rate: 5.776E-05 | global batch size: 256 | lm loss: 2.246520E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.283 | TFLOPs: 30.92 | 7: iteration 80670/ 115203 | consumed samples: 20651520 | consumed tokens: 42294312960 | elapsed time per iteration (s): 0.43 | learning rate: 5.774E-05 | global batch size: 256 | lm loss: 2.255266E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.303 | TFLOPs: 31.39 | 7: iteration 80680/ 115203 | consumed samples: 20654080 | consumed tokens: 42299555840 | elapsed time per iteration (s): 0.43 | learning rate: 5.772E-05 | global batch size: 256 | lm loss: 2.238657E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.738 | TFLOPs: 31.47 | 7: iteration 80690/ 115203 | consumed samples: 20656640 | consumed tokens: 42304798720 | elapsed time per iteration (s): 0.44 | learning rate: 5.770E-05 | global batch size: 256 | lm loss: 2.282681E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.522 | TFLOPs: 30.41 | 7: iteration 80700/ 115203 | consumed samples: 20659200 | consumed tokens: 42310041600 | elapsed time per iteration (s): 0.43 | learning rate: 5.768E-05 | global batch size: 256 | lm loss: 2.267910E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.585 | TFLOPs: 31.25 | 7: iteration 80710/ 115203 | consumed samples: 20661760 | consumed tokens: 42315284480 | elapsed time per iteration (s): 0.43 | learning rate: 5.766E-05 | global batch size: 256 | lm loss: 2.262956E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.689 | TFLOPs: 31.36 | 7: iteration 80720/ 115203 | consumed samples: 20664320 | consumed tokens: 42320527360 | elapsed time per iteration (s): 0.43 | learning rate: 5.764E-05 | global batch size: 256 | lm loss: 2.240757E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.974 | TFLOPs: 31.48 | 7: iteration 80730/ 115203 | consumed samples: 20666880 | consumed tokens: 42325770240 | elapsed time per iteration (s): 0.43 | learning rate: 5.762E-05 | global batch size: 256 | lm loss: 2.267587E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.654 | TFLOPs: 31.57 | 7: iteration 80740/ 115203 | consumed samples: 20669440 | consumed tokens: 42331013120 | elapsed time per iteration (s): 0.43 | learning rate: 5.760E-05 | global batch size: 256 | lm loss: 2.250070E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.978 | TFLOPs: 30.96 | 7: iteration 80750/ 115203 | consumed samples: 20672000 | consumed tokens: 42336256000 | elapsed time per iteration (s): 0.43 | learning rate: 5.758E-05 | global batch size: 256 | lm loss: 2.250381E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.998 | TFLOPs: 31.22 | 7: iteration 80760/ 115203 | consumed samples: 20674560 | consumed tokens: 42341498880 | elapsed time per iteration (s): 0.43 | learning rate: 5.756E-05 | global batch size: 256 | lm loss: 2.238830E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.177 | TFLOPs: 31.44 | 7: iteration 80770/ 115203 | consumed samples: 20677120 | consumed tokens: 42346741760 | elapsed time per iteration (s): 0.44 | learning rate: 5.754E-05 | global batch size: 256 | lm loss: 2.283119E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.491 | TFLOPs: 30.51 | 7: iteration 80780/ 115203 | consumed samples: 20679680 | consumed tokens: 42351984640 | elapsed time per iteration (s): 0.44 | learning rate: 5.752E-05 | global batch size: 256 | lm loss: 2.246678E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.147 | TFLOPs: 30.81 | 7: iteration 80790/ 115203 | consumed samples: 20682240 | consumed tokens: 42357227520 | elapsed time per iteration (s): 0.44 | learning rate: 5.750E-05 | global batch size: 256 | lm loss: 2.235894E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.293 | TFLOPs: 30.87 | 7: iteration 80800/ 115203 | consumed samples: 20684800 | consumed tokens: 42362470400 | elapsed time per iteration (s): 0.43 | learning rate: 5.748E-05 | global batch size: 256 | lm loss: 2.203013E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.887 | TFLOPs: 31.11 | 7: iteration 80810/ 115203 | consumed samples: 20687360 | consumed tokens: 42367713280 | elapsed time per iteration (s): 0.43 | learning rate: 5.746E-05 | global batch size: 256 | lm loss: 2.238890E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.926 | TFLOPs: 31.00 | 7: iteration 80820/ 115203 | consumed samples: 20689920 | consumed tokens: 42372956160 | elapsed time per iteration (s): 0.43 | learning rate: 5.744E-05 | global batch size: 256 | lm loss: 2.223074E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.486 | TFLOPs: 31.24 | 7: iteration 80830/ 115203 | consumed samples: 20692480 | consumed tokens: 42378199040 | elapsed time per iteration (s): 0.44 | learning rate: 5.742E-05 | global batch size: 256 | lm loss: 2.227020E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.150 | TFLOPs: 30.54 | 7: iteration 80840/ 115203 | consumed samples: 20695040 | consumed tokens: 42383441920 | elapsed time per iteration (s): 0.44 | learning rate: 5.740E-05 | global batch size: 256 | lm loss: 2.219650E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.833 | TFLOPs: 30.74 | 7: iteration 80850/ 115203 | consumed samples: 20697600 | consumed tokens: 42388684800 | elapsed time per iteration (s): 0.44 | learning rate: 5.738E-05 | global batch size: 256 | lm loss: 2.269279E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.641 | TFLOPs: 30.57 | 7: iteration 80860/ 115203 | consumed samples: 20700160 | consumed tokens: 42393927680 | elapsed time per iteration (s): 0.44 | learning rate: 5.736E-05 | global batch size: 256 | lm loss: 2.248153E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.651 | TFLOPs: 30.62 | 7: iteration 80870/ 115203 | consumed samples: 20702720 | consumed tokens: 42399170560 | elapsed time per iteration (s): 0.44 | learning rate: 5.734E-05 | global batch size: 256 | lm loss: 2.267677E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.283 | TFLOPs: 30.66 | 7: iteration 80880/ 115203 | consumed samples: 20705280 | consumed tokens: 42404413440 | elapsed time per iteration (s): 0.43 | learning rate: 5.732E-05 | global batch size: 256 | lm loss: 2.228762E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.566 | TFLOPs: 30.99 | 7: iteration 80890/ 115203 | consumed samples: 20707840 | consumed tokens: 42409656320 | elapsed time per iteration (s): 0.42 | learning rate: 5.730E-05 | global batch size: 256 | lm loss: 2.257344E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.445 | TFLOPs: 31.61 | 7: iteration 80900/ 115203 | consumed samples: 20710400 | consumed tokens: 42414899200 | elapsed time per iteration (s): 0.43 | learning rate: 5.728E-05 | global batch size: 256 | lm loss: 2.217246E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.881 | TFLOPs: 31.06 | 7: iteration 80910/ 115203 | consumed samples: 20712960 | consumed tokens: 42420142080 | elapsed time per iteration (s): 0.44 | learning rate: 5.726E-05 | global batch size: 256 | lm loss: 2.233159E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.422 | TFLOPs: 30.61 | 7: iteration 80920/ 115203 | consumed samples: 20715520 | consumed tokens: 42425384960 | elapsed time per iteration (s): 0.49 | learning rate: 5.724E-05 | global batch size: 256 | lm loss: 2.263855E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 525.832 | TFLOPs: 27.59 | 7: iteration 80930/ 115203 | consumed samples: 20718080 | consumed tokens: 42430627840 | elapsed time per iteration (s): 0.43 | learning rate: 5.722E-05 | global batch size: 256 | lm loss: 2.254353E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.602 | TFLOPs: 31.36 | 7: iteration 80940/ 115203 | consumed samples: 20720640 | consumed tokens: 42435870720 | elapsed time per iteration (s): 0.43 | learning rate: 5.720E-05 | global batch size: 256 | lm loss: 2.248412E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.534 | TFLOPs: 30.98 | 7: iteration 80950/ 115203 | consumed samples: 20723200 | consumed tokens: 42441113600 | elapsed time per iteration (s): 0.44 | learning rate: 5.718E-05 | global batch size: 256 | lm loss: 2.236039E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.974 | TFLOPs: 30.59 | 7: iteration 80960/ 115203 | consumed samples: 20725760 | consumed tokens: 42446356480 | elapsed time per iteration (s): 0.43 | learning rate: 5.716E-05 | global batch size: 256 | lm loss: 2.236530E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.972 | TFLOPs: 31.27 | 7: iteration 80970/ 115203 | consumed samples: 20728320 | consumed tokens: 42451599360 | elapsed time per iteration (s): 0.43 | learning rate: 5.714E-05 | global batch size: 256 | lm loss: 2.241019E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.822 | TFLOPs: 31.00 | 7: iteration 80980/ 115203 | consumed samples: 20730880 | consumed tokens: 42456842240 | elapsed time per iteration (s): 0.44 | learning rate: 5.712E-05 | global batch size: 256 | lm loss: 2.219526E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.226 | TFLOPs: 30.23 | 7: iteration 80990/ 115203 | consumed samples: 20733440 | consumed tokens: 42462085120 | elapsed time per iteration (s): 0.44 | learning rate: 5.710E-05 | global batch size: 256 | lm loss: 2.258100E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.505 | TFLOPs: 30.51 | 7: iteration 81000/ 115203 | consumed samples: 20736000 | consumed tokens: 42467328000 | elapsed time per iteration (s): 0.43 | learning rate: 5.708E-05 | global batch size: 256 | lm loss: 2.259076E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.335 | TFLOPs: 31.34 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 81000 | lm loss value: 2.155563E+00 | lm loss PPL: 8.632752E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 81000 to checkpoints_221m 0: [2022-11-28 22:42:35,728] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step81000 is begin to save! 0: [2022-11-28 22:42:35,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:42:35,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:42:35,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:42:35,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:42:35,895] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:42:35,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:42:35,920] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:42:35,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:42:35,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:42:35,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:42:35,969] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:42:35,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:42:35,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:42:36,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:42:36,017] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:42:36,041] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:42:36,042] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:42:36,065] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:42:36,066] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:42:36,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:42:36,091] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:42:36,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:42:36,116] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:42:36,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:42:36,140] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:42:36,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:42:36,165] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:42:36,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:42:36,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:42:36,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:42:36,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:42:36,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:42:36,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:42:36,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:42:36,261] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:42:36,286] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:42:36,286] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:42:36,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:42:36,310] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:42:36,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:42:36,314] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step81000/mp_rank_00_model_states.pt 0: [2022-11-28 22:42:36,315] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:42:36,317] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:42:36,335] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:42:36,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,395] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 1: [2022-11-28 22:42:36,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:42:36,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:42:36,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 5: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:42:36,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2022-11-28 22:42:36,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:42:36,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 3: [2022-11-28 22:42:36,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 4: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:42:36,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:42:36,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,418] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,418] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,418] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:42:36,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 6: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 2: [2022-11-28 22:42:36,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:42:36,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:42:36,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: [2022-11-28 22:42:36,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:42:36,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:42:36,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:42:36,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step81000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 7: [2022-11-28 22:42:36,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step81000 is ready now! 0: successfully saved checkpoint at iteration 81000 to checkpoints_221m 7: time (ms) | save-checkpoint: 929.08 7: iteration 81010/ 115203 | consumed samples: 20738560 | consumed tokens: 42472570880 | elapsed time per iteration (s): 0.56 | learning rate: 5.706E-05 | global batch size: 256 | lm loss: 2.254771E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 460.841 | TFLOPs: 24.18 | 7: iteration 81020/ 115203 | consumed samples: 20741120 | consumed tokens: 42477813760 | elapsed time per iteration (s): 0.43 | learning rate: 5.704E-05 | global batch size: 256 | lm loss: 2.240419E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.159 | TFLOPs: 31.02 | 7: iteration 81030/ 115203 | consumed samples: 20743680 | consumed tokens: 42483056640 | elapsed time per iteration (s): 0.43 | learning rate: 5.702E-05 | global batch size: 256 | lm loss: 2.236905E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.034 | TFLOPs: 30.91 | 7: iteration 81040/ 115203 | consumed samples: 20746240 | consumed tokens: 42488299520 | elapsed time per iteration (s): 0.43 | learning rate: 5.700E-05 | global batch size: 256 | lm loss: 2.228316E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.302 | TFLOPs: 31.13 | 7: iteration 81050/ 115203 | consumed samples: 20748800 | consumed tokens: 42493542400 | elapsed time per iteration (s): 0.64 | learning rate: 5.698E-05 | global batch size: 256 | lm loss: 2.255548E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 399.173 | TFLOPs: 20.94 | 7: iteration 81060/ 115203 | consumed samples: 20751360 | consumed tokens: 42498785280 | elapsed time per iteration (s): 0.43 | learning rate: 5.696E-05 | global batch size: 256 | lm loss: 2.176115E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.715 | TFLOPs: 31.57 | 7: iteration 81070/ 115203 | consumed samples: 20753920 | consumed tokens: 42504028160 | elapsed time per iteration (s): 0.43 | learning rate: 5.694E-05 | global batch size: 256 | lm loss: 2.243650E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.354 | TFLOPs: 31.08 | 7: iteration 81080/ 115203 | consumed samples: 20756480 | consumed tokens: 42509271040 | elapsed time per iteration (s): 0.43 | learning rate: 5.692E-05 | global batch size: 256 | lm loss: 2.254417E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.268 | TFLOPs: 31.34 | 7: iteration 81090/ 115203 | consumed samples: 20759040 | consumed tokens: 42514513920 | elapsed time per iteration (s): 0.43 | learning rate: 5.690E-05 | global batch size: 256 | lm loss: 2.258581E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.865 | TFLOPs: 31.26 | 7: iteration 81100/ 115203 | consumed samples: 20761600 | consumed tokens: 42519756800 | elapsed time per iteration (s): 0.43 | learning rate: 5.688E-05 | global batch size: 256 | lm loss: 2.226592E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.437 | TFLOPs: 31.08 | 7: iteration 81110/ 115203 | consumed samples: 20764160 | consumed tokens: 42524999680 | elapsed time per iteration (s): 0.44 | learning rate: 5.686E-05 | global batch size: 256 | lm loss: 2.207898E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.988 | TFLOPs: 30.75 | 7: iteration 81120/ 115203 | consumed samples: 20766720 | consumed tokens: 42530242560 | elapsed time per iteration (s): 0.43 | learning rate: 5.684E-05 | global batch size: 256 | lm loss: 2.237467E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.947 | TFLOPs: 31.53 | 7: iteration 81130/ 115203 | consumed samples: 20769280 | consumed tokens: 42535485440 | elapsed time per iteration (s): 0.42 | learning rate: 5.682E-05 | global batch size: 256 | lm loss: 2.226659E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.659 | TFLOPs: 31.83 | 7: iteration 81140/ 115203 | consumed samples: 20771840 | consumed tokens: 42540728320 | elapsed time per iteration (s): 0.42 | learning rate: 5.680E-05 | global batch size: 256 | lm loss: 2.292492E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.618 | TFLOPs: 31.67 | 7: iteration 81150/ 115203 | consumed samples: 20774400 | consumed tokens: 42545971200 | elapsed time per iteration (s): 0.43 | learning rate: 5.678E-05 | global batch size: 256 | lm loss: 2.236714E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.272 | TFLOPs: 31.23 | 7: iteration 81160/ 115203 | consumed samples: 20776960 | consumed tokens: 42551214080 | elapsed time per iteration (s): 0.42 | learning rate: 5.676E-05 | global batch size: 256 | lm loss: 2.266417E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.795 | TFLOPs: 31.73 | 7: iteration 81170/ 115203 | consumed samples: 20779520 | consumed tokens: 42556456960 | elapsed time per iteration (s): 0.45 | learning rate: 5.674E-05 | global batch size: 256 | lm loss: 2.217631E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.262 | TFLOPs: 30.08 | 7: iteration 81180/ 115203 | consumed samples: 20782080 | consumed tokens: 42561699840 | elapsed time per iteration (s): 0.44 | learning rate: 5.672E-05 | global batch size: 256 | lm loss: 2.235535E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.640 | TFLOPs: 30.78 | 7: iteration 81190/ 115203 | consumed samples: 20784640 | consumed tokens: 42566942720 | elapsed time per iteration (s): 0.43 | learning rate: 5.670E-05 | global batch size: 256 | lm loss: 2.250623E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.028 | TFLOPs: 31.33 | 7: iteration 81200/ 115203 | consumed samples: 20787200 | consumed tokens: 42572185600 | elapsed time per iteration (s): 0.44 | learning rate: 5.668E-05 | global batch size: 256 | lm loss: 2.241721E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.852 | TFLOPs: 30.74 | 7: iteration 81210/ 115203 | consumed samples: 20789760 | consumed tokens: 42577428480 | elapsed time per iteration (s): 0.43 | learning rate: 5.666E-05 | global batch size: 256 | lm loss: 2.268305E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.283 | TFLOPs: 31.02 | 7: iteration 81220/ 115203 | consumed samples: 20792320 | consumed tokens: 42582671360 | elapsed time per iteration (s): 0.43 | learning rate: 5.664E-05 | global batch size: 256 | lm loss: 2.244149E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.466 | TFLOPs: 31.03 | 7: iteration 81230/ 115203 | consumed samples: 20794880 | consumed tokens: 42587914240 | elapsed time per iteration (s): 0.43 | learning rate: 5.662E-05 | global batch size: 256 | lm loss: 2.254159E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.066 | TFLOPs: 31.12 | 7: iteration 81240/ 115203 | consumed samples: 20797440 | consumed tokens: 42593157120 | elapsed time per iteration (s): 0.44 | learning rate: 5.660E-05 | global batch size: 256 | lm loss: 2.237272E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.917 | TFLOPs: 30.32 | 7: iteration 81250/ 115203 | consumed samples: 20800000 | consumed tokens: 42598400000 | elapsed time per iteration (s): 0.42 | learning rate: 5.658E-05 | global batch size: 256 | lm loss: 2.229618E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.889 | TFLOPs: 31.69 | 7: iteration 81260/ 115203 | consumed samples: 20802560 | consumed tokens: 42603642880 | elapsed time per iteration (s): 0.42 | learning rate: 5.656E-05 | global batch size: 256 | lm loss: 2.258797E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.931 | TFLOPs: 31.69 | 7: iteration 81270/ 115203 | consumed samples: 20805120 | consumed tokens: 42608885760 | elapsed time per iteration (s): 0.43 | learning rate: 5.654E-05 | global batch size: 256 | lm loss: 2.268817E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.796 | TFLOPs: 31.05 | 7: iteration 81280/ 115203 | consumed samples: 20807680 | consumed tokens: 42614128640 | elapsed time per iteration (s): 0.43 | learning rate: 5.652E-05 | global batch size: 256 | lm loss: 2.251779E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.762 | TFLOPs: 31.57 | 7: iteration 81290/ 115203 | consumed samples: 20810240 | consumed tokens: 42619371520 | elapsed time per iteration (s): 0.43 | learning rate: 5.650E-05 | global batch size: 256 | lm loss: 2.225315E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.879 | TFLOPs: 31.00 | 7: iteration 81300/ 115203 | consumed samples: 20812800 | consumed tokens: 42624614400 | elapsed time per iteration (s): 0.43 | learning rate: 5.648E-05 | global batch size: 256 | lm loss: 2.233166E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.990 | TFLOPs: 31.27 | 7: iteration 81310/ 115203 | consumed samples: 20815360 | consumed tokens: 42629857280 | elapsed time per iteration (s): 0.43 | learning rate: 5.646E-05 | global batch size: 256 | lm loss: 2.241394E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.267 | TFLOPs: 31.29 | 7: iteration 81320/ 115203 | consumed samples: 20817920 | consumed tokens: 42635100160 | elapsed time per iteration (s): 0.44 | learning rate: 5.644E-05 | global batch size: 256 | lm loss: 2.255073E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.802 | TFLOPs: 30.26 | 7: iteration 81330/ 115203 | consumed samples: 20820480 | consumed tokens: 42640343040 | elapsed time per iteration (s): 0.43 | learning rate: 5.642E-05 | global batch size: 256 | lm loss: 2.256481E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.205 | TFLOPs: 31.18 | 7: iteration 81340/ 115203 | consumed samples: 20823040 | consumed tokens: 42645585920 | elapsed time per iteration (s): 0.42 | learning rate: 5.640E-05 | global batch size: 256 | lm loss: 2.244316E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.607 | TFLOPs: 31.72 | 7: iteration 81350/ 115203 | consumed samples: 20825600 | consumed tokens: 42650828800 | elapsed time per iteration (s): 0.43 | learning rate: 5.638E-05 | global batch size: 256 | lm loss: 2.206724E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.451 | TFLOPs: 31.24 | 7: iteration 81360/ 115203 | consumed samples: 20828160 | consumed tokens: 42656071680 | elapsed time per iteration (s): 0.43 | learning rate: 5.636E-05 | global batch size: 256 | lm loss: 2.232936E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.102 | TFLOPs: 31.38 | 7: iteration 81370/ 115203 | consumed samples: 20830720 | consumed tokens: 42661314560 | elapsed time per iteration (s): 0.43 | learning rate: 5.634E-05 | global batch size: 256 | lm loss: 2.225353E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.578 | TFLOPs: 31.09 | 7: iteration 81380/ 115203 | consumed samples: 20833280 | consumed tokens: 42666557440 | elapsed time per iteration (s): 0.43 | learning rate: 5.632E-05 | global batch size: 256 | lm loss: 2.252414E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.924 | TFLOPs: 30.95 | 7: iteration 81390/ 115203 | consumed samples: 20835840 | consumed tokens: 42671800320 | elapsed time per iteration (s): 0.43 | learning rate: 5.630E-05 | global batch size: 256 | lm loss: 2.196594E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.141 | TFLOPs: 31.44 | 7: iteration 81400/ 115203 | consumed samples: 20838400 | consumed tokens: 42677043200 | elapsed time per iteration (s): 0.43 | learning rate: 5.628E-05 | global batch size: 256 | lm loss: 2.276978E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.451 | TFLOPs: 31.40 | 7: iteration 81410/ 115203 | consumed samples: 20840960 | consumed tokens: 42682286080 | elapsed time per iteration (s): 0.43 | learning rate: 5.626E-05 | global batch size: 256 | lm loss: 2.241784E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.918 | TFLOPs: 31.32 | 7: iteration 81420/ 115203 | consumed samples: 20843520 | consumed tokens: 42687528960 | elapsed time per iteration (s): 0.43 | learning rate: 5.624E-05 | global batch size: 256 | lm loss: 2.220889E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.668 | TFLOPs: 31.46 | 7: iteration 81430/ 115203 | consumed samples: 20846080 | consumed tokens: 42692771840 | elapsed time per iteration (s): 0.43 | learning rate: 5.622E-05 | global batch size: 256 | lm loss: 2.246461E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.254 | TFLOPs: 31.44 | 7: iteration 81440/ 115203 | consumed samples: 20848640 | consumed tokens: 42698014720 | elapsed time per iteration (s): 0.43 | learning rate: 5.620E-05 | global batch size: 256 | lm loss: 2.256875E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.766 | TFLOPs: 31.36 | 7: iteration 81450/ 115203 | consumed samples: 20851200 | consumed tokens: 42703257600 | elapsed time per iteration (s): 0.43 | learning rate: 5.618E-05 | global batch size: 256 | lm loss: 2.227588E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.929 | TFLOPs: 31.16 | 7: iteration 81460/ 115203 | consumed samples: 20853760 | consumed tokens: 42708500480 | elapsed time per iteration (s): 0.43 | learning rate: 5.616E-05 | global batch size: 256 | lm loss: 2.201839E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.711 | TFLOPs: 31.31 | 7: iteration 81470/ 115203 | consumed samples: 20856320 | consumed tokens: 42713743360 | elapsed time per iteration (s): 0.44 | learning rate: 5.614E-05 | global batch size: 256 | lm loss: 2.258045E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.551 | TFLOPs: 30.62 | 7: iteration 81480/ 115203 | consumed samples: 20858880 | consumed tokens: 42718986240 | elapsed time per iteration (s): 0.43 | learning rate: 5.612E-05 | global batch size: 256 | lm loss: 2.243355E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.379 | TFLOPs: 31.19 | 7: iteration 81490/ 115203 | consumed samples: 20861440 | consumed tokens: 42724229120 | elapsed time per iteration (s): 0.44 | learning rate: 5.610E-05 | global batch size: 256 | lm loss: 2.244428E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.296 | TFLOPs: 30.18 | 7: iteration 81500/ 115203 | consumed samples: 20864000 | consumed tokens: 42729472000 | elapsed time per iteration (s): 0.42 | learning rate: 5.608E-05 | global batch size: 256 | lm loss: 2.261802E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.844 | TFLOPs: 31.84 | 7: iteration 81510/ 115203 | consumed samples: 20866560 | consumed tokens: 42734714880 | elapsed time per iteration (s): 0.43 | learning rate: 5.606E-05 | global batch size: 256 | lm loss: 2.278179E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.808 | TFLOPs: 31.52 | 7: iteration 81520/ 115203 | consumed samples: 20869120 | consumed tokens: 42739957760 | elapsed time per iteration (s): 0.45 | learning rate: 5.604E-05 | global batch size: 256 | lm loss: 2.220437E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.647 | TFLOPs: 29.94 | 7: iteration 81530/ 115203 | consumed samples: 20871680 | consumed tokens: 42745200640 | elapsed time per iteration (s): 0.44 | learning rate: 5.602E-05 | global batch size: 256 | lm loss: 2.252464E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.575 | TFLOPs: 30.78 | 7: iteration 81540/ 115203 | consumed samples: 20874240 | consumed tokens: 42750443520 | elapsed time per iteration (s): 0.44 | learning rate: 5.600E-05 | global batch size: 256 | lm loss: 2.244331E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.923 | TFLOPs: 30.69 | 7: iteration 81550/ 115203 | consumed samples: 20876800 | consumed tokens: 42755686400 | elapsed time per iteration (s): 0.42 | learning rate: 5.598E-05 | global batch size: 256 | lm loss: 2.259335E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.771 | TFLOPs: 31.84 | 7: iteration 81560/ 115203 | consumed samples: 20879360 | consumed tokens: 42760929280 | elapsed time per iteration (s): 0.44 | learning rate: 5.596E-05 | global batch size: 256 | lm loss: 2.245190E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.103 | TFLOPs: 30.70 | 7: iteration 81570/ 115203 | consumed samples: 20881920 | consumed tokens: 42766172160 | elapsed time per iteration (s): 0.43 | learning rate: 5.594E-05 | global batch size: 256 | lm loss: 2.242554E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.317 | TFLOPs: 31.39 | 7: iteration 81580/ 115203 | consumed samples: 20884480 | consumed tokens: 42771415040 | elapsed time per iteration (s): 0.43 | learning rate: 5.592E-05 | global batch size: 256 | lm loss: 2.224544E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.170 | TFLOPs: 31.33 | 7: iteration 81590/ 115203 | consumed samples: 20887040 | consumed tokens: 42776657920 | elapsed time per iteration (s): 0.44 | learning rate: 5.590E-05 | global batch size: 256 | lm loss: 2.230776E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.181 | TFLOPs: 30.44 | 7: iteration 81600/ 115203 | consumed samples: 20889600 | consumed tokens: 42781900800 | elapsed time per iteration (s): 0.43 | learning rate: 5.588E-05 | global batch size: 256 | lm loss: 2.265688E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.988 | TFLOPs: 31.06 | 7: iteration 81610/ 115203 | consumed samples: 20892160 | consumed tokens: 42787143680 | elapsed time per iteration (s): 0.43 | learning rate: 5.586E-05 | global batch size: 256 | lm loss: 2.207424E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.969 | TFLOPs: 31.43 | 7: iteration 81620/ 115203 | consumed samples: 20894720 | consumed tokens: 42792386560 | elapsed time per iteration (s): 0.43 | learning rate: 5.584E-05 | global batch size: 256 | lm loss: 2.261986E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.109 | TFLOPs: 31.28 | 7: iteration 81630/ 115203 | consumed samples: 20897280 | consumed tokens: 42797629440 | elapsed time per iteration (s): 0.43 | learning rate: 5.582E-05 | global batch size: 256 | lm loss: 2.255223E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.287 | TFLOPs: 31.13 | 7: iteration 81640/ 115203 | consumed samples: 20899840 | consumed tokens: 42802872320 | elapsed time per iteration (s): 0.44 | learning rate: 5.580E-05 | global batch size: 256 | lm loss: 2.229246E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.346 | TFLOPs: 30.61 | 7: iteration 81650/ 115203 | consumed samples: 20902400 | consumed tokens: 42808115200 | elapsed time per iteration (s): 0.42 | learning rate: 5.578E-05 | global batch size: 256 | lm loss: 2.237336E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.367 | TFLOPs: 31.61 | 7: iteration 81660/ 115203 | consumed samples: 20904960 | consumed tokens: 42813358080 | elapsed time per iteration (s): 0.43 | learning rate: 5.576E-05 | global batch size: 256 | lm loss: 2.240585E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.279 | TFLOPs: 31.13 | 7: iteration 81670/ 115203 | consumed samples: 20907520 | consumed tokens: 42818600960 | elapsed time per iteration (s): 0.45 | learning rate: 5.574E-05 | global batch size: 256 | lm loss: 2.237848E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.184 | TFLOPs: 30.18 | 7: iteration 81680/ 115203 | consumed samples: 20910080 | consumed tokens: 42823843840 | elapsed time per iteration (s): 0.44 | learning rate: 5.572E-05 | global batch size: 256 | lm loss: 2.273690E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.136 | TFLOPs: 30.39 | 7: iteration 81690/ 115203 | consumed samples: 20912640 | consumed tokens: 42829086720 | elapsed time per iteration (s): 0.44 | learning rate: 5.570E-05 | global batch size: 256 | lm loss: 2.212416E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.635 | TFLOPs: 30.62 | 7: iteration 81700/ 115203 | consumed samples: 20915200 | consumed tokens: 42834329600 | elapsed time per iteration (s): 0.44 | learning rate: 5.568E-05 | global batch size: 256 | lm loss: 2.260032E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.912 | TFLOPs: 30.74 | 7: iteration 81710/ 115203 | consumed samples: 20917760 | consumed tokens: 42839572480 | elapsed time per iteration (s): 0.44 | learning rate: 5.566E-05 | global batch size: 256 | lm loss: 2.270712E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.308 | TFLOPs: 30.76 | 7: iteration 81720/ 115203 | consumed samples: 20920320 | consumed tokens: 42844815360 | elapsed time per iteration (s): 0.44 | learning rate: 5.564E-05 | global batch size: 256 | lm loss: 2.255613E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.913 | TFLOPs: 30.85 | 7: iteration 81730/ 115203 | consumed samples: 20922880 | consumed tokens: 42850058240 | elapsed time per iteration (s): 0.43 | learning rate: 5.562E-05 | global batch size: 256 | lm loss: 2.273335E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.356 | TFLOPs: 31.13 | 7: iteration 81740/ 115203 | consumed samples: 20925440 | consumed tokens: 42855301120 | elapsed time per iteration (s): 0.43 | learning rate: 5.560E-05 | global batch size: 256 | lm loss: 2.230752E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.441 | TFLOPs: 31.40 | 7: iteration 81750/ 115203 | consumed samples: 20928000 | consumed tokens: 42860544000 | elapsed time per iteration (s): 0.43 | learning rate: 5.558E-05 | global batch size: 256 | lm loss: 2.279295E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.902 | TFLOPs: 31.16 | 7: iteration 81760/ 115203 | consumed samples: 20930560 | consumed tokens: 42865786880 | elapsed time per iteration (s): 0.43 | learning rate: 5.556E-05 | global batch size: 256 | lm loss: 2.257632E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.281 | TFLOPs: 30.92 | 7: iteration 81770/ 115203 | consumed samples: 20933120 | consumed tokens: 42871029760 | elapsed time per iteration (s): 0.45 | learning rate: 5.554E-05 | global batch size: 256 | lm loss: 2.260900E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.238 | TFLOPs: 30.13 | 7: iteration 81780/ 115203 | consumed samples: 20935680 | consumed tokens: 42876272640 | elapsed time per iteration (s): 0.43 | learning rate: 5.552E-05 | global batch size: 256 | lm loss: 2.252830E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.476 | TFLOPs: 31.19 | 7: iteration 81790/ 115203 | consumed samples: 20938240 | consumed tokens: 42881515520 | elapsed time per iteration (s): 0.44 | learning rate: 5.550E-05 | global batch size: 256 | lm loss: 2.254312E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.297 | TFLOPs: 30.81 | 7: iteration 81800/ 115203 | consumed samples: 20940800 | consumed tokens: 42886758400 | elapsed time per iteration (s): 0.43 | learning rate: 5.548E-05 | global batch size: 256 | lm loss: 2.223940E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.046 | TFLOPs: 31.43 | 7: iteration 81810/ 115203 | consumed samples: 20943360 | consumed tokens: 42892001280 | elapsed time per iteration (s): 0.43 | learning rate: 5.547E-05 | global batch size: 256 | lm loss: 2.216629E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.078 | TFLOPs: 31.22 | 7: iteration 81820/ 115203 | consumed samples: 20945920 | consumed tokens: 42897244160 | elapsed time per iteration (s): 0.43 | learning rate: 5.545E-05 | global batch size: 256 | lm loss: 2.264969E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.768 | TFLOPs: 31.26 | 7: iteration 81830/ 115203 | consumed samples: 20948480 | consumed tokens: 42902487040 | elapsed time per iteration (s): 0.45 | learning rate: 5.543E-05 | global batch size: 256 | lm loss: 2.245496E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.057 | TFLOPs: 29.65 | 7: iteration 81840/ 115203 | consumed samples: 20951040 | consumed tokens: 42907729920 | elapsed time per iteration (s): 0.43 | learning rate: 5.541E-05 | global batch size: 256 | lm loss: 2.250702E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.391 | TFLOPs: 31.08 | 7: iteration 81850/ 115203 | consumed samples: 20953600 | consumed tokens: 42912972800 | elapsed time per iteration (s): 0.44 | learning rate: 5.539E-05 | global batch size: 256 | lm loss: 2.228464E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.158 | TFLOPs: 30.44 | 7: iteration 81860/ 115203 | consumed samples: 20956160 | consumed tokens: 42918215680 | elapsed time per iteration (s): 0.42 | learning rate: 5.537E-05 | global batch size: 256 | lm loss: 2.246308E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.118 | TFLOPs: 31.80 | 7: iteration 81870/ 115203 | consumed samples: 20958720 | consumed tokens: 42923458560 | elapsed time per iteration (s): 0.43 | learning rate: 5.535E-05 | global batch size: 256 | lm loss: 2.245187E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.332 | TFLOPs: 30.92 | 7: iteration 81880/ 115203 | consumed samples: 20961280 | consumed tokens: 42928701440 | elapsed time per iteration (s): 0.43 | learning rate: 5.533E-05 | global batch size: 256 | lm loss: 2.230602E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.894 | TFLOPs: 31.11 | 7: iteration 81890/ 115203 | consumed samples: 20963840 | consumed tokens: 42933944320 | elapsed time per iteration (s): 0.42 | learning rate: 5.531E-05 | global batch size: 256 | lm loss: 2.238271E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.105 | TFLOPs: 31.91 | 7: iteration 81900/ 115203 | consumed samples: 20966400 | consumed tokens: 42939187200 | elapsed time per iteration (s): 0.45 | learning rate: 5.529E-05 | global batch size: 256 | lm loss: 2.239279E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.940 | TFLOPs: 29.90 | 7: iteration 81910/ 115203 | consumed samples: 20968960 | consumed tokens: 42944430080 | elapsed time per iteration (s): 0.43 | learning rate: 5.527E-05 | global batch size: 256 | lm loss: 2.244324E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.917 | TFLOPs: 31.53 | 7: iteration 81920/ 115203 | consumed samples: 20971520 | consumed tokens: 42949672960 | elapsed time per iteration (s): 0.44 | learning rate: 5.525E-05 | global batch size: 256 | lm loss: 2.250605E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.286 | TFLOPs: 30.87 | 7: iteration 81930/ 115203 | consumed samples: 20974080 | consumed tokens: 42954915840 | elapsed time per iteration (s): 0.43 | learning rate: 5.523E-05 | global batch size: 256 | lm loss: 2.241732E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.678 | TFLOPs: 31.04 | 7: iteration 81940/ 115203 | consumed samples: 20976640 | consumed tokens: 42960158720 | elapsed time per iteration (s): 0.44 | learning rate: 5.521E-05 | global batch size: 256 | lm loss: 2.232098E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.757 | TFLOPs: 30.26 | 7: iteration 81950/ 115203 | consumed samples: 20979200 | consumed tokens: 42965401600 | elapsed time per iteration (s): 0.44 | learning rate: 5.519E-05 | global batch size: 256 | lm loss: 2.235292E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.395 | TFLOPs: 30.82 | 7: iteration 81960/ 115203 | consumed samples: 20981760 | consumed tokens: 42970644480 | elapsed time per iteration (s): 0.43 | learning rate: 5.517E-05 | global batch size: 256 | lm loss: 2.243557E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.310 | TFLOPs: 30.92 | 7: iteration 81970/ 115203 | consumed samples: 20984320 | consumed tokens: 42975887360 | elapsed time per iteration (s): 0.43 | learning rate: 5.515E-05 | global batch size: 256 | lm loss: 2.258538E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.197 | TFLOPs: 31.60 | 7: iteration 81980/ 115203 | consumed samples: 20986880 | consumed tokens: 42981130240 | elapsed time per iteration (s): 0.44 | learning rate: 5.513E-05 | global batch size: 256 | lm loss: 2.221103E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.731 | TFLOPs: 30.37 | 7: iteration 81990/ 115203 | consumed samples: 20989440 | consumed tokens: 42986373120 | elapsed time per iteration (s): 0.43 | learning rate: 5.511E-05 | global batch size: 256 | lm loss: 2.206247E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.607 | TFLOPs: 31.20 | 0: [2022-11-28 22:49:51,499] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=0, lr=[5.5091074271143155e-05, 5.5091074271143155e-05, 5.5091074271143155e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 82000/ 115203 | consumed samples: 20992000 | consumed tokens: 42991616000 | elapsed time per iteration (s): 0.44 | learning rate: 5.509E-05 | global batch size: 256 | lm loss: 2.251855E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.300 | TFLOPs: 30.24 | 0: steps: 82000 loss: 2.2566 iter time (s): 0.439 samples/sec: 583.041 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 82000 | lm loss value: 2.273297E+00 | lm loss PPL: 9.711367E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 82000 to checkpoints_221m 0: [2022-11-28 22:49:51,684] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step82000 is begin to save! 0: [2022-11-28 22:49:51,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:49:51,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:49:51,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:49:51,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:49:51,899] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:49:51,932] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:49:51,932] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:49:51,964] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:49:51,965] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:49:51,997] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:49:51,997] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:49:52,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:49:52,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:49:52,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:49:52,064] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:49:52,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:49:52,097] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:49:52,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:49:52,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:49:52,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:49:52,162] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:49:52,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:49:52,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:49:52,226] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:49:52,226] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:49:52,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:49:52,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:49:52,293] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:49:52,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:49:52,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:49:52,325] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:49:52,357] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:49:52,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:49:52,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:49:52,391] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:49:52,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:49:52,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:49:52,456] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:49:52,456] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:49:52,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:49:52,462] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step82000/mp_rank_00_model_states.pt 0: [2022-11-28 22:49:52,462] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:49:52,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:49:52,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:49:52,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,533] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,533] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,536] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,536] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,536] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,537] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,537] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,537] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,538] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,538] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,539] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,539] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,540] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,543] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,543] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,540] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,541] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,541] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,542] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,542] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,542] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,544] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,544] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,545] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,545] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,546] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,546] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,549] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,549] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 2: [2022-11-28 22:49:52,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:49:52,551] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 22:49:52,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 1: [2022-11-28 22:49:52,556] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:49:52,556] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:49:52,556] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,558] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,558] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,558] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:49:52,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,562] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 4: [2022-11-28 22:49:52,562] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,547] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:49:52,548] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,548] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 3: [2022-11-28 22:49:52,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 22:49:52,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 6: [2022-11-28 22:49:52,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:49:52,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:49:52,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 7: [2022-11-28 22:49:52,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:49:52,569] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:49:52,569] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,586] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,586] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:49:52,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:49:52,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: [2022-11-28 22:49:52,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:49:52,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,658] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step82000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 5: [2022-11-28 22:49:52,658] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step82000 is ready now! 0: successfully saved checkpoint at iteration 82000 to checkpoints_221m 7: time (ms) | save-checkpoint: 993.75 7: iteration 82010/ 115203 | consumed samples: 20994560 | consumed tokens: 42996858880 | elapsed time per iteration (s): 0.55 | learning rate: 5.507E-05 | global batch size: 256 | lm loss: 2.272576E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 468.572 | TFLOPs: 24.59 | 7: iteration 82020/ 115203 | consumed samples: 20997120 | consumed tokens: 43002101760 | elapsed time per iteration (s): 0.44 | learning rate: 5.505E-05 | global batch size: 256 | lm loss: 2.256407E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.871 | TFLOPs: 30.69 | 7: iteration 82030/ 115203 | consumed samples: 20999680 | consumed tokens: 43007344640 | elapsed time per iteration (s): 0.44 | learning rate: 5.503E-05 | global batch size: 256 | lm loss: 2.289901E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.355 | TFLOPs: 30.61 | 7: iteration 82040/ 115203 | consumed samples: 21002240 | consumed tokens: 43012587520 | elapsed time per iteration (s): 0.43 | learning rate: 5.501E-05 | global batch size: 256 | lm loss: 2.191602E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.583 | TFLOPs: 31.35 | 7: iteration 82050/ 115203 | consumed samples: 21004800 | consumed tokens: 43017830400 | elapsed time per iteration (s): 0.64 | learning rate: 5.499E-05 | global batch size: 256 | lm loss: 2.261688E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 401.552 | TFLOPs: 21.07 | 7: iteration 82060/ 115203 | consumed samples: 21007360 | consumed tokens: 43023073280 | elapsed time per iteration (s): 0.43 | learning rate: 5.497E-05 | global batch size: 256 | lm loss: 2.272357E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.771 | TFLOPs: 31.10 | 7: iteration 82070/ 115203 | consumed samples: 21009920 | consumed tokens: 43028316160 | elapsed time per iteration (s): 0.44 | learning rate: 5.495E-05 | global batch size: 256 | lm loss: 2.244460E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.946 | TFLOPs: 30.64 | 7: iteration 82080/ 115203 | consumed samples: 21012480 | consumed tokens: 43033559040 | elapsed time per iteration (s): 0.43 | learning rate: 5.493E-05 | global batch size: 256 | lm loss: 2.219227E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.407 | TFLOPs: 30.98 | 7: iteration 82090/ 115203 | consumed samples: 21015040 | consumed tokens: 43038801920 | elapsed time per iteration (s): 0.45 | learning rate: 5.491E-05 | global batch size: 256 | lm loss: 2.262659E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.170 | TFLOPs: 29.60 | 7: iteration 82100/ 115203 | consumed samples: 21017600 | consumed tokens: 43044044800 | elapsed time per iteration (s): 0.42 | learning rate: 5.489E-05 | global batch size: 256 | lm loss: 2.217602E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.379 | TFLOPs: 31.92 | 7: iteration 82110/ 115203 | consumed samples: 21020160 | consumed tokens: 43049287680 | elapsed time per iteration (s): 0.43 | learning rate: 5.488E-05 | global batch size: 256 | lm loss: 2.226939E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.297 | TFLOPs: 31.18 | 7: iteration 82120/ 115203 | consumed samples: 21022720 | consumed tokens: 43054530560 | elapsed time per iteration (s): 0.44 | learning rate: 5.486E-05 | global batch size: 256 | lm loss: 2.257761E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.927 | TFLOPs: 30.64 | 7: iteration 82130/ 115203 | consumed samples: 21025280 | consumed tokens: 43059773440 | elapsed time per iteration (s): 0.43 | learning rate: 5.484E-05 | global batch size: 256 | lm loss: 2.226705E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.614 | TFLOPs: 31.46 | 7: iteration 82140/ 115203 | consumed samples: 21027840 | consumed tokens: 43065016320 | elapsed time per iteration (s): 0.43 | learning rate: 5.482E-05 | global batch size: 256 | lm loss: 2.253642E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.062 | TFLOPs: 31.17 | 7: iteration 82150/ 115203 | consumed samples: 21030400 | consumed tokens: 43070259200 | elapsed time per iteration (s): 0.43 | learning rate: 5.480E-05 | global batch size: 256 | lm loss: 2.251371E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.361 | TFLOPs: 31.13 | 7: iteration 82160/ 115203 | consumed samples: 21032960 | consumed tokens: 43075502080 | elapsed time per iteration (s): 0.44 | learning rate: 5.478E-05 | global batch size: 256 | lm loss: 2.279011E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.047 | TFLOPs: 30.22 | 7: iteration 82170/ 115203 | consumed samples: 21035520 | consumed tokens: 43080744960 | elapsed time per iteration (s): 0.44 | learning rate: 5.476E-05 | global batch size: 256 | lm loss: 2.269348E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.598 | TFLOPs: 30.78 | 7: iteration 82180/ 115203 | consumed samples: 21038080 | consumed tokens: 43085987840 | elapsed time per iteration (s): 0.43 | learning rate: 5.474E-05 | global batch size: 256 | lm loss: 2.242712E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.104 | TFLOPs: 31.07 | 7: iteration 82190/ 115203 | consumed samples: 21040640 | consumed tokens: 43091230720 | elapsed time per iteration (s): 0.43 | learning rate: 5.472E-05 | global batch size: 256 | lm loss: 2.249698E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.248 | TFLOPs: 31.13 | 7: iteration 82200/ 115203 | consumed samples: 21043200 | consumed tokens: 43096473600 | elapsed time per iteration (s): 0.44 | learning rate: 5.470E-05 | global batch size: 256 | lm loss: 2.268797E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.690 | TFLOPs: 30.78 | 7: iteration 82210/ 115203 | consumed samples: 21045760 | consumed tokens: 43101716480 | elapsed time per iteration (s): 0.44 | learning rate: 5.468E-05 | global batch size: 256 | lm loss: 2.231664E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.135 | TFLOPs: 30.86 | 7: iteration 82220/ 115203 | consumed samples: 21048320 | consumed tokens: 43106959360 | elapsed time per iteration (s): 0.43 | learning rate: 5.466E-05 | global batch size: 256 | lm loss: 2.252737E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.770 | TFLOPs: 31.05 | 7: iteration 82230/ 115203 | consumed samples: 21050880 | consumed tokens: 43112202240 | elapsed time per iteration (s): 0.43 | learning rate: 5.464E-05 | global batch size: 256 | lm loss: 2.230669E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.240 | TFLOPs: 31.34 | 7: iteration 82240/ 115203 | consumed samples: 21053440 | consumed tokens: 43117445120 | elapsed time per iteration (s): 0.43 | learning rate: 5.462E-05 | global batch size: 256 | lm loss: 2.250709E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.410 | TFLOPs: 31.08 | 7: iteration 82250/ 115203 | consumed samples: 21056000 | consumed tokens: 43122688000 | elapsed time per iteration (s): 0.44 | learning rate: 5.460E-05 | global batch size: 256 | lm loss: 2.248600E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.188 | TFLOPs: 30.28 | 7: iteration 82260/ 115203 | consumed samples: 21058560 | consumed tokens: 43127930880 | elapsed time per iteration (s): 0.43 | learning rate: 5.458E-05 | global batch size: 256 | lm loss: 2.269709E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.571 | TFLOPs: 30.99 | 7: iteration 82270/ 115203 | consumed samples: 21061120 | consumed tokens: 43133173760 | elapsed time per iteration (s): 0.43 | learning rate: 5.456E-05 | global batch size: 256 | lm loss: 2.270153E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.235 | TFLOPs: 31.07 | 7: iteration 82280/ 115203 | consumed samples: 21063680 | consumed tokens: 43138416640 | elapsed time per iteration (s): 0.42 | learning rate: 5.454E-05 | global batch size: 256 | lm loss: 2.256651E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.401 | TFLOPs: 31.61 | 7: iteration 82290/ 115203 | consumed samples: 21066240 | consumed tokens: 43143659520 | elapsed time per iteration (s): 0.43 | learning rate: 5.452E-05 | global batch size: 256 | lm loss: 2.248707E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.766 | TFLOPs: 31.42 | 7: iteration 82300/ 115203 | consumed samples: 21068800 | consumed tokens: 43148902400 | elapsed time per iteration (s): 0.44 | learning rate: 5.450E-05 | global batch size: 256 | lm loss: 2.230491E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.494 | TFLOPs: 30.51 | 7: iteration 82310/ 115203 | consumed samples: 21071360 | consumed tokens: 43154145280 | elapsed time per iteration (s): 0.43 | learning rate: 5.448E-05 | global batch size: 256 | lm loss: 2.217404E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.615 | TFLOPs: 31.09 | 7: iteration 82320/ 115203 | consumed samples: 21073920 | consumed tokens: 43159388160 | elapsed time per iteration (s): 0.43 | learning rate: 5.446E-05 | global batch size: 256 | lm loss: 2.219780E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.418 | TFLOPs: 31.24 | 7: iteration 82330/ 115203 | consumed samples: 21076480 | consumed tokens: 43164631040 | elapsed time per iteration (s): 0.44 | learning rate: 5.445E-05 | global batch size: 256 | lm loss: 2.218286E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.429 | TFLOPs: 30.40 | 7: iteration 82340/ 115203 | consumed samples: 21079040 | consumed tokens: 43169873920 | elapsed time per iteration (s): 0.42 | learning rate: 5.443E-05 | global batch size: 256 | lm loss: 2.271008E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.115 | TFLOPs: 31.75 | 7: iteration 82350/ 115203 | consumed samples: 21081600 | consumed tokens: 43175116800 | elapsed time per iteration (s): 0.43 | learning rate: 5.441E-05 | global batch size: 256 | lm loss: 2.216856E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.021 | TFLOPs: 30.91 | 7: iteration 82360/ 115203 | consumed samples: 21084160 | consumed tokens: 43180359680 | elapsed time per iteration (s): 0.45 | learning rate: 5.439E-05 | global batch size: 256 | lm loss: 2.240243E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.102 | TFLOPs: 30.12 | 7: iteration 82370/ 115203 | consumed samples: 21086720 | consumed tokens: 43185602560 | elapsed time per iteration (s): 0.44 | learning rate: 5.437E-05 | global batch size: 256 | lm loss: 2.252532E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.536 | TFLOPs: 30.20 | 7: iteration 82380/ 115203 | consumed samples: 21089280 | consumed tokens: 43190845440 | elapsed time per iteration (s): 0.44 | learning rate: 5.435E-05 | global batch size: 256 | lm loss: 2.219580E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.787 | TFLOPs: 30.21 | 7: iteration 82390/ 115203 | consumed samples: 21091840 | consumed tokens: 43196088320 | elapsed time per iteration (s): 0.43 | learning rate: 5.433E-05 | global batch size: 256 | lm loss: 2.233211E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.435 | TFLOPs: 30.98 | 7: iteration 82400/ 115203 | consumed samples: 21094400 | consumed tokens: 43201331200 | elapsed time per iteration (s): 0.44 | learning rate: 5.431E-05 | global batch size: 256 | lm loss: 2.221898E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.652 | TFLOPs: 30.57 | 7: iteration 82410/ 115203 | consumed samples: 21096960 | consumed tokens: 43206574080 | elapsed time per iteration (s): 0.44 | learning rate: 5.429E-05 | global batch size: 256 | lm loss: 2.227671E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.593 | TFLOPs: 30.78 | 7: iteration 82420/ 115203 | consumed samples: 21099520 | consumed tokens: 43211816960 | elapsed time per iteration (s): 0.43 | learning rate: 5.427E-05 | global batch size: 256 | lm loss: 2.233133E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.726 | TFLOPs: 31.41 | 7: iteration 82430/ 115203 | consumed samples: 21102080 | consumed tokens: 43217059840 | elapsed time per iteration (s): 0.45 | learning rate: 5.425E-05 | global batch size: 256 | lm loss: 2.262306E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.763 | TFLOPs: 29.79 | 7: iteration 82440/ 115203 | consumed samples: 21104640 | consumed tokens: 43222302720 | elapsed time per iteration (s): 0.43 | learning rate: 5.423E-05 | global batch size: 256 | lm loss: 2.227512E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.385 | TFLOPs: 31.29 | 7: iteration 82450/ 115203 | consumed samples: 21107200 | consumed tokens: 43227545600 | elapsed time per iteration (s): 0.44 | learning rate: 5.421E-05 | global batch size: 256 | lm loss: 2.258343E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.699 | TFLOPs: 30.47 | 7: iteration 82460/ 115203 | consumed samples: 21109760 | consumed tokens: 43232788480 | elapsed time per iteration (s): 0.43 | learning rate: 5.419E-05 | global batch size: 256 | lm loss: 2.259391E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.389 | TFLOPs: 30.92 | 7: iteration 82470/ 115203 | consumed samples: 21112320 | consumed tokens: 43238031360 | elapsed time per iteration (s): 0.43 | learning rate: 5.417E-05 | global batch size: 256 | lm loss: 2.211626E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.129 | TFLOPs: 31.49 | 7: iteration 82480/ 115203 | consumed samples: 21114880 | consumed tokens: 43243274240 | elapsed time per iteration (s): 0.43 | learning rate: 5.415E-05 | global batch size: 256 | lm loss: 2.224103E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.606 | TFLOPs: 31.20 | 7: iteration 82490/ 115203 | consumed samples: 21117440 | consumed tokens: 43248517120 | elapsed time per iteration (s): 0.43 | learning rate: 5.413E-05 | global batch size: 256 | lm loss: 2.224334E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.567 | TFLOPs: 30.93 | 7: iteration 82500/ 115203 | consumed samples: 21120000 | consumed tokens: 43253760000 | elapsed time per iteration (s): 0.43 | learning rate: 5.411E-05 | global batch size: 256 | lm loss: 2.268566E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.812 | TFLOPs: 31.26 | 7: iteration 82510/ 115203 | consumed samples: 21122560 | consumed tokens: 43259002880 | elapsed time per iteration (s): 0.43 | learning rate: 5.409E-05 | global batch size: 256 | lm loss: 2.231826E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.157 | TFLOPs: 31.33 | 7: iteration 82520/ 115203 | consumed samples: 21125120 | consumed tokens: 43264245760 | elapsed time per iteration (s): 0.43 | learning rate: 5.408E-05 | global batch size: 256 | lm loss: 2.249661E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.756 | TFLOPs: 31.26 | 7: iteration 82530/ 115203 | consumed samples: 21127680 | consumed tokens: 43269488640 | elapsed time per iteration (s): 0.43 | learning rate: 5.406E-05 | global batch size: 256 | lm loss: 2.262012E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.914 | TFLOPs: 31.37 | 7: iteration 82540/ 115203 | consumed samples: 21130240 | consumed tokens: 43274731520 | elapsed time per iteration (s): 0.44 | learning rate: 5.404E-05 | global batch size: 256 | lm loss: 2.231764E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.820 | TFLOPs: 30.79 | 7: iteration 82550/ 115203 | consumed samples: 21132800 | consumed tokens: 43279974400 | elapsed time per iteration (s): 0.43 | learning rate: 5.402E-05 | global batch size: 256 | lm loss: 2.255426E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.174 | TFLOPs: 31.44 | 7: iteration 82560/ 115203 | consumed samples: 21135360 | consumed tokens: 43285217280 | elapsed time per iteration (s): 0.43 | learning rate: 5.400E-05 | global batch size: 256 | lm loss: 2.244053E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.539 | TFLOPs: 31.56 | 7: iteration 82570/ 115203 | consumed samples: 21137920 | consumed tokens: 43290460160 | elapsed time per iteration (s): 0.44 | learning rate: 5.398E-05 | global batch size: 256 | lm loss: 2.239423E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.418 | TFLOPs: 30.51 | 7: iteration 82580/ 115203 | consumed samples: 21140480 | consumed tokens: 43295703040 | elapsed time per iteration (s): 0.44 | learning rate: 5.396E-05 | global batch size: 256 | lm loss: 2.278123E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.393 | TFLOPs: 30.35 | 7: iteration 82590/ 115203 | consumed samples: 21143040 | consumed tokens: 43300945920 | elapsed time per iteration (s): 0.43 | learning rate: 5.394E-05 | global batch size: 256 | lm loss: 2.239741E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.019 | TFLOPs: 31.06 | 7: iteration 82600/ 115203 | consumed samples: 21145600 | consumed tokens: 43306188800 | elapsed time per iteration (s): 0.43 | learning rate: 5.392E-05 | global batch size: 256 | lm loss: 2.263950E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.343 | TFLOPs: 31.08 | 7: iteration 82610/ 115203 | consumed samples: 21148160 | consumed tokens: 43311431680 | elapsed time per iteration (s): 0.43 | learning rate: 5.390E-05 | global batch size: 256 | lm loss: 2.238733E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.493 | TFLOPs: 31.30 | 7: iteration 82620/ 115203 | consumed samples: 21150720 | consumed tokens: 43316674560 | elapsed time per iteration (s): 0.43 | learning rate: 5.388E-05 | global batch size: 256 | lm loss: 2.247150E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.704 | TFLOPs: 30.99 | 7: iteration 82630/ 115203 | consumed samples: 21153280 | consumed tokens: 43321917440 | elapsed time per iteration (s): 0.44 | learning rate: 5.386E-05 | global batch size: 256 | lm loss: 2.257170E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.753 | TFLOPs: 30.52 | 7: iteration 82640/ 115203 | consumed samples: 21155840 | consumed tokens: 43327160320 | elapsed time per iteration (s): 0.44 | learning rate: 5.384E-05 | global batch size: 256 | lm loss: 2.250702E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.197 | TFLOPs: 30.23 | 7: iteration 82650/ 115203 | consumed samples: 21158400 | consumed tokens: 43332403200 | elapsed time per iteration (s): 0.43 | learning rate: 5.382E-05 | global batch size: 256 | lm loss: 2.238627E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.884 | TFLOPs: 31.32 | 7: iteration 82660/ 115203 | consumed samples: 21160960 | consumed tokens: 43337646080 | elapsed time per iteration (s): 0.43 | learning rate: 5.380E-05 | global batch size: 256 | lm loss: 2.255635E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.355 | TFLOPs: 31.18 | 7: iteration 82670/ 115203 | consumed samples: 21163520 | consumed tokens: 43342888960 | elapsed time per iteration (s): 0.43 | learning rate: 5.378E-05 | global batch size: 256 | lm loss: 2.233053E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.638 | TFLOPs: 31.57 | 7: iteration 82680/ 115203 | consumed samples: 21166080 | consumed tokens: 43348131840 | elapsed time per iteration (s): 0.43 | learning rate: 5.377E-05 | global batch size: 256 | lm loss: 2.230307E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.791 | TFLOPs: 30.89 | 7: iteration 82690/ 115203 | consumed samples: 21168640 | consumed tokens: 43353374720 | elapsed time per iteration (s): 0.45 | learning rate: 5.375E-05 | global batch size: 256 | lm loss: 2.242389E+00 | grad norm: 0.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.022 | TFLOPs: 29.86 | 7: iteration 82700/ 115203 | consumed samples: 21171200 | consumed tokens: 43358617600 | elapsed time per iteration (s): 0.44 | learning rate: 5.373E-05 | global batch size: 256 | lm loss: 2.255073E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.024 | TFLOPs: 30.85 | 7: iteration 82710/ 115203 | consumed samples: 21173760 | consumed tokens: 43363860480 | elapsed time per iteration (s): 0.45 | learning rate: 5.371E-05 | global batch size: 256 | lm loss: 2.223578E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.459 | TFLOPs: 29.83 | 7: iteration 82720/ 115203 | consumed samples: 21176320 | consumed tokens: 43369103360 | elapsed time per iteration (s): 0.43 | learning rate: 5.369E-05 | global batch size: 256 | lm loss: 2.234453E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.540 | TFLOPs: 31.19 | 7: iteration 82730/ 115203 | consumed samples: 21178880 | consumed tokens: 43374346240 | elapsed time per iteration (s): 0.43 | learning rate: 5.367E-05 | global batch size: 256 | lm loss: 2.245902E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.762 | TFLOPs: 30.89 | 7: iteration 82740/ 115203 | consumed samples: 21181440 | consumed tokens: 43379589120 | elapsed time per iteration (s): 0.44 | learning rate: 5.365E-05 | global batch size: 256 | lm loss: 2.246374E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.498 | TFLOPs: 30.72 | 7: iteration 82750/ 115203 | consumed samples: 21184000 | consumed tokens: 43384832000 | elapsed time per iteration (s): 0.44 | learning rate: 5.363E-05 | global batch size: 256 | lm loss: 2.263001E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.675 | TFLOPs: 30.78 | 7: iteration 82760/ 115203 | consumed samples: 21186560 | consumed tokens: 43390074880 | elapsed time per iteration (s): 0.43 | learning rate: 5.361E-05 | global batch size: 256 | lm loss: 2.269471E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.615 | TFLOPs: 31.57 | 7: iteration 82770/ 115203 | consumed samples: 21189120 | consumed tokens: 43395317760 | elapsed time per iteration (s): 0.43 | learning rate: 5.359E-05 | global batch size: 256 | lm loss: 2.243217E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.633 | TFLOPs: 31.20 | 7: iteration 82780/ 115203 | consumed samples: 21191680 | consumed tokens: 43400560640 | elapsed time per iteration (s): 0.44 | learning rate: 5.357E-05 | global batch size: 256 | lm loss: 2.253419E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.298 | TFLOPs: 30.60 | 7: iteration 82790/ 115203 | consumed samples: 21194240 | consumed tokens: 43405803520 | elapsed time per iteration (s): 0.43 | learning rate: 5.355E-05 | global batch size: 256 | lm loss: 2.285625E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.880 | TFLOPs: 31.37 | 7: iteration 82800/ 115203 | consumed samples: 21196800 | consumed tokens: 43411046400 | elapsed time per iteration (s): 0.43 | learning rate: 5.353E-05 | global batch size: 256 | lm loss: 2.241783E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.042 | TFLOPs: 31.22 | 7: iteration 82810/ 115203 | consumed samples: 21199360 | consumed tokens: 43416289280 | elapsed time per iteration (s): 0.43 | learning rate: 5.351E-05 | global batch size: 256 | lm loss: 2.232389E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.535 | TFLOPs: 31.19 | 7: iteration 82820/ 115203 | consumed samples: 21201920 | consumed tokens: 43421532160 | elapsed time per iteration (s): 0.43 | learning rate: 5.349E-05 | global batch size: 256 | lm loss: 2.220485E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.371 | TFLOPs: 31.45 | 7: iteration 82830/ 115203 | consumed samples: 21204480 | consumed tokens: 43426775040 | elapsed time per iteration (s): 0.43 | learning rate: 5.348E-05 | global batch size: 256 | lm loss: 2.254879E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.321 | TFLOPs: 31.03 | 7: iteration 82840/ 115203 | consumed samples: 21207040 | consumed tokens: 43432017920 | elapsed time per iteration (s): 0.44 | learning rate: 5.346E-05 | global batch size: 256 | lm loss: 2.232878E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.642 | TFLOPs: 30.68 | 7: iteration 82850/ 115203 | consumed samples: 21209600 | consumed tokens: 43437260800 | elapsed time per iteration (s): 0.42 | learning rate: 5.344E-05 | global batch size: 256 | lm loss: 2.234160E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.396 | TFLOPs: 31.61 | 7: iteration 82860/ 115203 | consumed samples: 21212160 | consumed tokens: 43442503680 | elapsed time per iteration (s): 0.43 | learning rate: 5.342E-05 | global batch size: 256 | lm loss: 2.249487E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.707 | TFLOPs: 31.15 | 7: iteration 82870/ 115203 | consumed samples: 21214720 | consumed tokens: 43447746560 | elapsed time per iteration (s): 0.44 | learning rate: 5.340E-05 | global batch size: 256 | lm loss: 2.263190E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.017 | TFLOPs: 30.80 | 7: iteration 82880/ 115203 | consumed samples: 21217280 | consumed tokens: 43452989440 | elapsed time per iteration (s): 0.42 | learning rate: 5.338E-05 | global batch size: 256 | lm loss: 2.268039E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.836 | TFLOPs: 31.84 | 7: iteration 82890/ 115203 | consumed samples: 21219840 | consumed tokens: 43458232320 | elapsed time per iteration (s): 0.43 | learning rate: 5.336E-05 | global batch size: 256 | lm loss: 2.262365E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.161 | TFLOPs: 31.12 | 7: iteration 82900/ 115203 | consumed samples: 21222400 | consumed tokens: 43463475200 | elapsed time per iteration (s): 0.44 | learning rate: 5.334E-05 | global batch size: 256 | lm loss: 2.236099E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.933 | TFLOPs: 30.85 | 7: iteration 82910/ 115203 | consumed samples: 21224960 | consumed tokens: 43468718080 | elapsed time per iteration (s): 0.43 | learning rate: 5.332E-05 | global batch size: 256 | lm loss: 2.259342E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.341 | TFLOPs: 30.97 | 7: iteration 82920/ 115203 | consumed samples: 21227520 | consumed tokens: 43473960960 | elapsed time per iteration (s): 0.44 | learning rate: 5.330E-05 | global batch size: 256 | lm loss: 2.241689E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.791 | TFLOPs: 30.74 | 7: iteration 82930/ 115203 | consumed samples: 21230080 | consumed tokens: 43479203840 | elapsed time per iteration (s): 0.44 | learning rate: 5.328E-05 | global batch size: 256 | lm loss: 2.223625E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.485 | TFLOPs: 30.61 | 7: iteration 82940/ 115203 | consumed samples: 21232640 | consumed tokens: 43484446720 | elapsed time per iteration (s): 0.43 | learning rate: 5.326E-05 | global batch size: 256 | lm loss: 2.253633E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.514 | TFLOPs: 31.30 | 7: iteration 82950/ 115203 | consumed samples: 21235200 | consumed tokens: 43489689600 | elapsed time per iteration (s): 0.43 | learning rate: 5.324E-05 | global batch size: 256 | lm loss: 2.231563E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.479 | TFLOPs: 31.45 | 7: iteration 82960/ 115203 | consumed samples: 21237760 | consumed tokens: 43494932480 | elapsed time per iteration (s): 0.44 | learning rate: 5.322E-05 | global batch size: 256 | lm loss: 2.238815E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.105 | TFLOPs: 30.86 | 7: iteration 82970/ 115203 | consumed samples: 21240320 | consumed tokens: 43500175360 | elapsed time per iteration (s): 0.43 | learning rate: 5.321E-05 | global batch size: 256 | lm loss: 2.266285E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.053 | TFLOPs: 31.01 | 7: iteration 82980/ 115203 | consumed samples: 21242880 | consumed tokens: 43505418240 | elapsed time per iteration (s): 0.44 | learning rate: 5.319E-05 | global batch size: 256 | lm loss: 2.249300E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.089 | TFLOPs: 30.75 | 7: iteration 82990/ 115203 | consumed samples: 21245440 | consumed tokens: 43510661120 | elapsed time per iteration (s): 0.43 | learning rate: 5.317E-05 | global batch size: 256 | lm loss: 2.264616E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.815 | TFLOPs: 31.47 | 7: iteration 83000/ 115203 | consumed samples: 21248000 | consumed tokens: 43515904000 | elapsed time per iteration (s): 0.43 | learning rate: 5.315E-05 | global batch size: 256 | lm loss: 2.242842E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.020 | TFLOPs: 31.06 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 83000 | lm loss value: 2.057218E+00 | lm loss PPL: 7.824173E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 83000 to checkpoints_221m 0: [2022-11-28 22:57:08,740] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step83000 is begin to save! 0: [2022-11-28 22:57:08,830] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_01-model_00-model_states.pt... 0: [2022-11-28 22:57:09,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_01-model_00-model_states.pt. 0: [2022-11-28 22:57:09,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_03-model_00-model_states.pt... 0: [2022-11-28 22:57:09,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_03-model_00-model_states.pt. 0: [2022-11-28 22:57:09,173] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_04-model_00-model_states.pt... 0: [2022-11-28 22:57:09,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_04-model_00-model_states.pt. 0: [2022-11-28 22:57:09,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_05-model_00-model_states.pt... 0: [2022-11-28 22:57:09,219] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_05-model_00-model_states.pt. 0: [2022-11-28 22:57:09,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_06-model_00-model_states.pt... 0: [2022-11-28 22:57:09,242] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_06-model_00-model_states.pt. 0: [2022-11-28 22:57:09,243] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_07-model_00-model_states.pt... 0: [2022-11-28 22:57:09,267] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_07-model_00-model_states.pt. 0: [2022-11-28 22:57:09,267] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_08-model_00-model_states.pt... 0: [2022-11-28 22:57:09,292] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_08-model_00-model_states.pt. 0: [2022-11-28 22:57:09,293] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_09-model_00-model_states.pt... 0: [2022-11-28 22:57:09,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_09-model_00-model_states.pt. 0: [2022-11-28 22:57:09,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_10-model_00-model_states.pt... 0: [2022-11-28 22:57:09,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_10-model_00-model_states.pt. 0: [2022-11-28 22:57:09,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_11-model_00-model_states.pt... 0: [2022-11-28 22:57:09,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_11-model_00-model_states.pt. 0: [2022-11-28 22:57:09,362] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_12-model_00-model_states.pt... 0: [2022-11-28 22:57:09,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_12-model_00-model_states.pt. 0: [2022-11-28 22:57:09,385] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_13-model_00-model_states.pt... 0: [2022-11-28 22:57:09,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_13-model_00-model_states.pt. 0: [2022-11-28 22:57:09,408] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_14-model_00-model_states.pt... 0: [2022-11-28 22:57:09,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_14-model_00-model_states.pt. 0: [2022-11-28 22:57:09,431] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_15-model_00-model_states.pt... 0: [2022-11-28 22:57:09,455] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_15-model_00-model_states.pt. 0: [2022-11-28 22:57:09,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_16-model_00-model_states.pt... 0: [2022-11-28 22:57:09,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_16-model_00-model_states.pt. 0: [2022-11-28 22:57:09,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_17-model_00-model_states.pt... 0: [2022-11-28 22:57:09,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_17-model_00-model_states.pt. 0: [2022-11-28 22:57:09,501] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_18-model_00-model_states.pt... 0: [2022-11-28 22:57:09,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_18-model_00-model_states.pt. 0: [2022-11-28 22:57:09,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_19-model_00-model_states.pt... 0: [2022-11-28 22:57:09,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_19-model_00-model_states.pt. 0: [2022-11-28 22:57:09,546] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_20-model_00-model_states.pt... 0: [2022-11-28 22:57:09,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_20-model_00-model_states.pt. 0: [2022-11-28 22:57:09,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/layer_22-model_00-model_states.pt... 0: [2022-11-28 22:57:09,573] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/layer_22-model_00-model_states.pt. 0: [2022-11-28 22:57:09,574] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step83000/mp_rank_00_model_states.pt 0: [2022-11-28 22:57:09,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/mp_rank_00_model_states.pt... 0: [2022-11-28 22:57:09,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/mp_rank_00_model_states.pt. 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 22:57:09,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 2: [2022-11-28 22:57:09,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,647] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,651] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,651] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,652] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,652] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,653] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 22:57:09,653] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2022-11-28 22:57:09,653] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,653] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,653] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 6: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 2: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 22:57:09,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,654] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,654] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,655] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,655] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 22:57:09,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,663] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 7: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,664] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,664] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 5: [2022-11-28 22:57:09,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,665] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 5: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,665] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 22:57:09,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,669] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 4: [2022-11-28 22:57:09,669] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 1: [2022-11-28 22:57:09,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 22:57:09,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 22:57:09,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: [2022-11-28 22:57:09,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 22:57:09,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step83000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 3: [2022-11-28 22:57:09,722] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step83000 is ready now! 0: successfully saved checkpoint at iteration 83000 to checkpoints_221m 7: time (ms) | save-checkpoint: 1016.01 7: iteration 83010/ 115203 | consumed samples: 21250560 | consumed tokens: 43521146880 | elapsed time per iteration (s): 0.54 | learning rate: 5.313E-05 | global batch size: 256 | lm loss: 2.229458E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 471.627 | TFLOPs: 24.75 | 7: iteration 83020/ 115203 | consumed samples: 21253120 | consumed tokens: 43526389760 | elapsed time per iteration (s): 0.44 | learning rate: 5.311E-05 | global batch size: 256 | lm loss: 2.230969E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.301 | TFLOPs: 30.76 | 7: iteration 83030/ 115203 | consumed samples: 21255680 | consumed tokens: 43531632640 | elapsed time per iteration (s): 0.44 | learning rate: 5.309E-05 | global batch size: 256 | lm loss: 2.224349E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.220 | TFLOPs: 30.60 | 7: iteration 83040/ 115203 | consumed samples: 21258240 | consumed tokens: 43536875520 | elapsed time per iteration (s): 0.61 | learning rate: 5.307E-05 | global batch size: 256 | lm loss: 2.224501E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 419.992 | TFLOPs: 22.04 | 7: iteration 83050/ 115203 | consumed samples: 21260800 | consumed tokens: 43542118400 | elapsed time per iteration (s): 0.45 | learning rate: 5.305E-05 | global batch size: 256 | lm loss: 2.263524E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.690 | TFLOPs: 29.84 | 7: iteration 83060/ 115203 | consumed samples: 21263360 | consumed tokens: 43547361280 | elapsed time per iteration (s): 0.44 | learning rate: 5.303E-05 | global batch size: 256 | lm loss: 2.241578E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.265 | TFLOPs: 30.66 | 7: iteration 83070/ 115203 | consumed samples: 21265920 | consumed tokens: 43552604160 | elapsed time per iteration (s): 0.43 | learning rate: 5.301E-05 | global batch size: 256 | lm loss: 2.257872E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.651 | TFLOPs: 31.52 | 7: iteration 83080/ 115203 | consumed samples: 21268480 | consumed tokens: 43557847040 | elapsed time per iteration (s): 0.44 | learning rate: 5.299E-05 | global batch size: 256 | lm loss: 2.261760E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.132 | TFLOPs: 30.60 | 7: iteration 83090/ 115203 | consumed samples: 21271040 | consumed tokens: 43563089920 | elapsed time per iteration (s): 0.43 | learning rate: 5.298E-05 | global batch size: 256 | lm loss: 2.225049E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.994 | TFLOPs: 30.90 | 7: iteration 83100/ 115203 | consumed samples: 21273600 | consumed tokens: 43568332800 | elapsed time per iteration (s): 0.44 | learning rate: 5.296E-05 | global batch size: 256 | lm loss: 2.257393E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.794 | TFLOPs: 30.68 | 7: iteration 83110/ 115203 | consumed samples: 21276160 | consumed tokens: 43573575680 | elapsed time per iteration (s): 0.44 | learning rate: 5.294E-05 | global batch size: 256 | lm loss: 2.225092E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.386 | TFLOPs: 30.40 | 7: iteration 83120/ 115203 | consumed samples: 21278720 | consumed tokens: 43578818560 | elapsed time per iteration (s): 0.43 | learning rate: 5.292E-05 | global batch size: 256 | lm loss: 2.255326E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.501 | TFLOPs: 31.51 | 7: iteration 83130/ 115203 | consumed samples: 21281280 | consumed tokens: 43584061440 | elapsed time per iteration (s): 0.44 | learning rate: 5.290E-05 | global batch size: 256 | lm loss: 2.255689E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.189 | TFLOPs: 30.86 | 7: iteration 83140/ 115203 | consumed samples: 21283840 | consumed tokens: 43589304320 | elapsed time per iteration (s): 0.43 | learning rate: 5.288E-05 | global batch size: 256 | lm loss: 2.198869E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.045 | TFLOPs: 30.91 | 7: iteration 83150/ 115203 | consumed samples: 21286400 | consumed tokens: 43594547200 | elapsed time per iteration (s): 0.43 | learning rate: 5.286E-05 | global batch size: 256 | lm loss: 2.234916E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.696 | TFLOPs: 31.15 | 7: iteration 83160/ 115203 | consumed samples: 21288960 | consumed tokens: 43599790080 | elapsed time per iteration (s): 0.42 | learning rate: 5.284E-05 | global batch size: 256 | lm loss: 2.232871E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.636 | TFLOPs: 31.62 | 7: iteration 83170/ 115203 | consumed samples: 21291520 | consumed tokens: 43605032960 | elapsed time per iteration (s): 0.44 | learning rate: 5.282E-05 | global batch size: 256 | lm loss: 2.285065E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.291 | TFLOPs: 30.66 | 7: iteration 83180/ 115203 | consumed samples: 21294080 | consumed tokens: 43610275840 | elapsed time per iteration (s): 0.44 | learning rate: 5.280E-05 | global batch size: 256 | lm loss: 2.261448E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.609 | TFLOPs: 30.78 | 7: iteration 83190/ 115203 | consumed samples: 21296640 | consumed tokens: 43615518720 | elapsed time per iteration (s): 0.43 | learning rate: 5.278E-05 | global batch size: 256 | lm loss: 2.212727E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.562 | TFLOPs: 31.14 | 7: iteration 83200/ 115203 | consumed samples: 21299200 | consumed tokens: 43620761600 | elapsed time per iteration (s): 0.43 | learning rate: 5.276E-05 | global batch size: 256 | lm loss: 2.228393E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.529 | TFLOPs: 30.93 | 7: iteration 83210/ 115203 | consumed samples: 21301760 | consumed tokens: 43626004480 | elapsed time per iteration (s): 0.43 | learning rate: 5.275E-05 | global batch size: 256 | lm loss: 2.266139E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.722 | TFLOPs: 31.15 | 7: iteration 83220/ 115203 | consumed samples: 21304320 | consumed tokens: 43631247360 | elapsed time per iteration (s): 0.45 | learning rate: 5.273E-05 | global batch size: 256 | lm loss: 2.216077E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.863 | TFLOPs: 29.95 | 7: iteration 83230/ 115203 | consumed samples: 21306880 | consumed tokens: 43636490240 | elapsed time per iteration (s): 0.44 | learning rate: 5.271E-05 | global batch size: 256 | lm loss: 2.226312E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.206 | TFLOPs: 30.76 | 7: iteration 83240/ 115203 | consumed samples: 21309440 | consumed tokens: 43641733120 | elapsed time per iteration (s): 0.43 | learning rate: 5.269E-05 | global batch size: 256 | lm loss: 2.260670E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.980 | TFLOPs: 31.11 | 7: iteration 83250/ 115203 | consumed samples: 21312000 | consumed tokens: 43646976000 | elapsed time per iteration (s): 0.43 | learning rate: 5.267E-05 | global batch size: 256 | lm loss: 2.261496E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.225 | TFLOPs: 31.28 | 7: iteration 83260/ 115203 | consumed samples: 21314560 | consumed tokens: 43652218880 | elapsed time per iteration (s): 0.44 | learning rate: 5.265E-05 | global batch size: 256 | lm loss: 2.231812E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.332 | TFLOPs: 30.61 | 7: iteration 83270/ 115203 | consumed samples: 21317120 | consumed tokens: 43657461760 | elapsed time per iteration (s): 0.44 | learning rate: 5.263E-05 | global batch size: 256 | lm loss: 2.242276E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.030 | TFLOPs: 30.43 | 7: iteration 83280/ 115203 | consumed samples: 21319680 | consumed tokens: 43662704640 | elapsed time per iteration (s): 0.43 | learning rate: 5.261E-05 | global batch size: 256 | lm loss: 2.259421E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.734 | TFLOPs: 30.89 | 7: iteration 83290/ 115203 | consumed samples: 21322240 | consumed tokens: 43667947520 | elapsed time per iteration (s): 0.43 | learning rate: 5.259E-05 | global batch size: 256 | lm loss: 2.246320E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.319 | TFLOPs: 31.13 | 7: iteration 83300/ 115203 | consumed samples: 21324800 | consumed tokens: 43673190400 | elapsed time per iteration (s): 0.44 | learning rate: 5.257E-05 | global batch size: 256 | lm loss: 2.268900E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.264 | TFLOPs: 30.45 | 7: iteration 83310/ 115203 | consumed samples: 21327360 | consumed tokens: 43678433280 | elapsed time per iteration (s): 0.43 | learning rate: 5.255E-05 | global batch size: 256 | lm loss: 2.208186E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.965 | TFLOPs: 30.90 | 7: iteration 83320/ 115203 | consumed samples: 21329920 | consumed tokens: 43683676160 | elapsed time per iteration (s): 0.42 | learning rate: 5.254E-05 | global batch size: 256 | lm loss: 2.270379E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.591 | TFLOPs: 31.62 | 7: iteration 83330/ 115203 | consumed samples: 21332480 | consumed tokens: 43688919040 | elapsed time per iteration (s): 0.43 | learning rate: 5.252E-05 | global batch size: 256 | lm loss: 2.236535E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.213 | TFLOPs: 31.12 | 7: iteration 83340/ 115203 | consumed samples: 21335040 | consumed tokens: 43694161920 | elapsed time per iteration (s): 0.43 | learning rate: 5.250E-05 | global batch size: 256 | lm loss: 2.238789E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.341 | TFLOPs: 31.29 | 7: iteration 83350/ 115203 | consumed samples: 21337600 | consumed tokens: 43699404800 | elapsed time per iteration (s): 0.44 | learning rate: 5.248E-05 | global batch size: 256 | lm loss: 2.255584E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.000 | TFLOPs: 30.48 | 7: iteration 83360/ 115203 | consumed samples: 21340160 | consumed tokens: 43704647680 | elapsed time per iteration (s): 0.44 | learning rate: 5.246E-05 | global batch size: 256 | lm loss: 2.226593E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.471 | TFLOPs: 30.88 | 7: iteration 83370/ 115203 | consumed samples: 21342720 | consumed tokens: 43709890560 | elapsed time per iteration (s): 0.43 | learning rate: 5.244E-05 | global batch size: 256 | lm loss: 2.250877E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.506 | TFLOPs: 30.98 | 7: iteration 83380/ 115203 | consumed samples: 21345280 | consumed tokens: 43715133440 | elapsed time per iteration (s): 0.44 | learning rate: 5.242E-05 | global batch size: 256 | lm loss: 2.256209E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.676 | TFLOPs: 30.73 | 7: iteration 83390/ 115203 | consumed samples: 21347840 | consumed tokens: 43720376320 | elapsed time per iteration (s): 0.44 | learning rate: 5.240E-05 | global batch size: 256 | lm loss: 2.235077E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.888 | TFLOPs: 30.43 | 7: iteration 83400/ 115203 | consumed samples: 21350400 | consumed tokens: 43725619200 | elapsed time per iteration (s): 0.44 | learning rate: 5.238E-05 | global batch size: 256 | lm loss: 2.264929E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.611 | TFLOPs: 30.25 | 7: iteration 83410/ 115203 | consumed samples: 21352960 | consumed tokens: 43730862080 | elapsed time per iteration (s): 0.44 | learning rate: 5.236E-05 | global batch size: 256 | lm loss: 2.232008E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.023 | TFLOPs: 30.43 | 7: iteration 83420/ 115203 | consumed samples: 21355520 | consumed tokens: 43736104960 | elapsed time per iteration (s): 0.44 | learning rate: 5.234E-05 | global batch size: 256 | lm loss: 2.244493E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.587 | TFLOPs: 30.83 | 7: iteration 83430/ 115203 | consumed samples: 21358080 | consumed tokens: 43741347840 | elapsed time per iteration (s): 0.44 | learning rate: 5.233E-05 | global batch size: 256 | lm loss: 2.239404E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.173 | TFLOPs: 30.81 | 7: iteration 83440/ 115203 | consumed samples: 21360640 | consumed tokens: 43746590720 | elapsed time per iteration (s): 0.43 | learning rate: 5.231E-05 | global batch size: 256 | lm loss: 2.262656E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.047 | TFLOPs: 31.43 | 7: iteration 83450/ 115203 | consumed samples: 21363200 | consumed tokens: 43751833600 | elapsed time per iteration (s): 0.43 | learning rate: 5.229E-05 | global batch size: 256 | lm loss: 2.241175E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.713 | TFLOPs: 30.99 | 7: iteration 83460/ 115203 | consumed samples: 21365760 | consumed tokens: 43757076480 | elapsed time per iteration (s): 0.44 | learning rate: 5.227E-05 | global batch size: 256 | lm loss: 2.227004E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.008 | TFLOPs: 30.80 | 7: iteration 83470/ 115203 | consumed samples: 21368320 | consumed tokens: 43762319360 | elapsed time per iteration (s): 0.44 | learning rate: 5.225E-05 | global batch size: 256 | lm loss: 2.232182E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.660 | TFLOPs: 30.57 | 7: iteration 83480/ 115203 | consumed samples: 21370880 | consumed tokens: 43767562240 | elapsed time per iteration (s): 0.43 | learning rate: 5.223E-05 | global batch size: 256 | lm loss: 2.238857E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.358 | TFLOPs: 31.13 | 7: iteration 83490/ 115203 | consumed samples: 21373440 | consumed tokens: 43772805120 | elapsed time per iteration (s): 0.44 | learning rate: 5.221E-05 | global batch size: 256 | lm loss: 2.233507E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.457 | TFLOPs: 30.56 | 7: iteration 83500/ 115203 | consumed samples: 21376000 | consumed tokens: 43778048000 | elapsed time per iteration (s): 0.44 | learning rate: 5.219E-05 | global batch size: 256 | lm loss: 2.222475E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.810 | TFLOPs: 30.79 | 7: iteration 83510/ 115203 | consumed samples: 21378560 | consumed tokens: 43783290880 | elapsed time per iteration (s): 0.43 | learning rate: 5.217E-05 | global batch size: 256 | lm loss: 2.230676E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.759 | TFLOPs: 31.36 | 7: iteration 83520/ 115203 | consumed samples: 21381120 | consumed tokens: 43788533760 | elapsed time per iteration (s): 0.43 | learning rate: 5.215E-05 | global batch size: 256 | lm loss: 2.209035E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.552 | TFLOPs: 31.35 | 7: iteration 83530/ 115203 | consumed samples: 21383680 | consumed tokens: 43793776640 | elapsed time per iteration (s): 0.44 | learning rate: 5.214E-05 | global batch size: 256 | lm loss: 2.262675E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.545 | TFLOPs: 30.57 | 7: iteration 83540/ 115203 | consumed samples: 21386240 | consumed tokens: 43799019520 | elapsed time per iteration (s): 0.43 | learning rate: 5.212E-05 | global batch size: 256 | lm loss: 2.223218E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.266 | TFLOPs: 30.97 | 7: iteration 83550/ 115203 | consumed samples: 21388800 | consumed tokens: 43804262400 | elapsed time per iteration (s): 0.42 | learning rate: 5.210E-05 | global batch size: 256 | lm loss: 2.250503E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.885 | TFLOPs: 31.89 | 7: iteration 83560/ 115203 | consumed samples: 21391360 | consumed tokens: 43809505280 | elapsed time per iteration (s): 0.44 | learning rate: 5.208E-05 | global batch size: 256 | lm loss: 2.211910E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.993 | TFLOPs: 30.38 | 7: iteration 83570/ 115203 | consumed samples: 21393920 | consumed tokens: 43814748160 | elapsed time per iteration (s): 0.44 | learning rate: 5.206E-05 | global batch size: 256 | lm loss: 2.236573E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.350 | TFLOPs: 30.76 | 7: iteration 83580/ 115203 | consumed samples: 21396480 | consumed tokens: 43819991040 | elapsed time per iteration (s): 0.43 | learning rate: 5.204E-05 | global batch size: 256 | lm loss: 2.275754E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.219 | TFLOPs: 30.97 | 7: iteration 83590/ 115203 | consumed samples: 21399040 | consumed tokens: 43825233920 | elapsed time per iteration (s): 0.43 | learning rate: 5.202E-05 | global batch size: 256 | lm loss: 2.229852E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.494 | TFLOPs: 31.35 | 7: iteration 83600/ 115203 | consumed samples: 21401600 | consumed tokens: 43830476800 | elapsed time per iteration (s): 0.44 | learning rate: 5.200E-05 | global batch size: 256 | lm loss: 2.272241E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.406 | TFLOPs: 30.51 | 7: iteration 83610/ 115203 | consumed samples: 21404160 | consumed tokens: 43835719680 | elapsed time per iteration (s): 0.43 | learning rate: 5.198E-05 | global batch size: 256 | lm loss: 2.252016E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.719 | TFLOPs: 31.26 | 7: iteration 83620/ 115203 | consumed samples: 21406720 | consumed tokens: 43840962560 | elapsed time per iteration (s): 0.44 | learning rate: 5.196E-05 | global batch size: 256 | lm loss: 2.225951E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.482 | TFLOPs: 30.61 | 7: iteration 83630/ 115203 | consumed samples: 21409280 | consumed tokens: 43846205440 | elapsed time per iteration (s): 0.43 | learning rate: 5.195E-05 | global batch size: 256 | lm loss: 2.240605E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.805 | TFLOPs: 31.21 | 7: iteration 83640/ 115203 | consumed samples: 21411840 | consumed tokens: 43851448320 | elapsed time per iteration (s): 0.43 | learning rate: 5.193E-05 | global batch size: 256 | lm loss: 2.224883E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.832 | TFLOPs: 31.26 | 7: iteration 83650/ 115203 | consumed samples: 21414400 | consumed tokens: 43856691200 | elapsed time per iteration (s): 0.44 | learning rate: 5.191E-05 | global batch size: 256 | lm loss: 2.220976E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.801 | TFLOPs: 30.42 | 7: iteration 83660/ 115203 | consumed samples: 21416960 | consumed tokens: 43861934080 | elapsed time per iteration (s): 0.44 | learning rate: 5.189E-05 | global batch size: 256 | lm loss: 2.259134E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.337 | TFLOPs: 30.87 | 7: iteration 83670/ 115203 | consumed samples: 21419520 | consumed tokens: 43867176960 | elapsed time per iteration (s): 0.43 | learning rate: 5.187E-05 | global batch size: 256 | lm loss: 2.222904E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.947 | TFLOPs: 31.32 | 7: iteration 83680/ 115203 | consumed samples: 21422080 | consumed tokens: 43872419840 | elapsed time per iteration (s): 0.44 | learning rate: 5.185E-05 | global batch size: 256 | lm loss: 2.242497E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.811 | TFLOPs: 30.58 | 7: iteration 83690/ 115203 | consumed samples: 21424640 | consumed tokens: 43877662720 | elapsed time per iteration (s): 0.43 | learning rate: 5.183E-05 | global batch size: 256 | lm loss: 2.216965E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.594 | TFLOPs: 31.09 | 7: iteration 83700/ 115203 | consumed samples: 21427200 | consumed tokens: 43882905600 | elapsed time per iteration (s): 0.43 | learning rate: 5.181E-05 | global batch size: 256 | lm loss: 2.239708E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.762 | TFLOPs: 31.57 | 7: iteration 83710/ 115203 | consumed samples: 21429760 | consumed tokens: 43888148480 | elapsed time per iteration (s): 0.42 | learning rate: 5.179E-05 | global batch size: 256 | lm loss: 2.231260E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.601 | TFLOPs: 31.67 | 7: iteration 83720/ 115203 | consumed samples: 21432320 | consumed tokens: 43893391360 | elapsed time per iteration (s): 0.44 | learning rate: 5.178E-05 | global batch size: 256 | lm loss: 2.240383E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.825 | TFLOPs: 30.74 | 7: iteration 83730/ 115203 | consumed samples: 21434880 | consumed tokens: 43898634240 | elapsed time per iteration (s): 0.43 | learning rate: 5.176E-05 | global batch size: 256 | lm loss: 2.225390E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.304 | TFLOPs: 31.44 | 7: iteration 83740/ 115203 | consumed samples: 21437440 | consumed tokens: 43903877120 | elapsed time per iteration (s): 0.45 | learning rate: 5.174E-05 | global batch size: 256 | lm loss: 2.225670E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.185 | TFLOPs: 30.02 | 7: iteration 83750/ 115203 | consumed samples: 21440000 | consumed tokens: 43909120000 | elapsed time per iteration (s): 0.44 | learning rate: 5.172E-05 | global batch size: 256 | lm loss: 2.239058E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.527 | TFLOPs: 30.67 | 7: iteration 83760/ 115203 | consumed samples: 21442560 | consumed tokens: 43914362880 | elapsed time per iteration (s): 0.43 | learning rate: 5.170E-05 | global batch size: 256 | lm loss: 2.245789E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.720 | TFLOPs: 31.10 | 7: iteration 83770/ 115203 | consumed samples: 21445120 | consumed tokens: 43919605760 | elapsed time per iteration (s): 0.42 | learning rate: 5.168E-05 | global batch size: 256 | lm loss: 2.253476E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.984 | TFLOPs: 31.80 | 7: iteration 83780/ 115203 | consumed samples: 21447680 | consumed tokens: 43924848640 | elapsed time per iteration (s): 0.44 | learning rate: 5.166E-05 | global batch size: 256 | lm loss: 2.248499E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.739 | TFLOPs: 30.84 | 7: iteration 83790/ 115203 | consumed samples: 21450240 | consumed tokens: 43930091520 | elapsed time per iteration (s): 0.44 | learning rate: 5.164E-05 | global batch size: 256 | lm loss: 2.191061E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.911 | TFLOPs: 30.85 | 7: iteration 83800/ 115203 | consumed samples: 21452800 | consumed tokens: 43935334400 | elapsed time per iteration (s): 0.43 | learning rate: 5.162E-05 | global batch size: 256 | lm loss: 2.226214E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.274 | TFLOPs: 31.60 | 7: iteration 83810/ 115203 | consumed samples: 21455360 | consumed tokens: 43940577280 | elapsed time per iteration (s): 0.44 | learning rate: 5.161E-05 | global batch size: 256 | lm loss: 2.255928E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.809 | TFLOPs: 30.37 | 7: iteration 83820/ 115203 | consumed samples: 21457920 | consumed tokens: 43945820160 | elapsed time per iteration (s): 0.43 | learning rate: 5.159E-05 | global batch size: 256 | lm loss: 2.246867E+00 | grad norm: 0.221 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.088 | TFLOPs: 31.22 | 7: iteration 83830/ 115203 | consumed samples: 21460480 | consumed tokens: 43951063040 | elapsed time per iteration (s): 0.43 | learning rate: 5.157E-05 | global batch size: 256 | lm loss: 2.230607E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.396 | TFLOPs: 31.29 | 7: iteration 83840/ 115203 | consumed samples: 21463040 | consumed tokens: 43956305920 | elapsed time per iteration (s): 0.43 | learning rate: 5.155E-05 | global batch size: 256 | lm loss: 2.260870E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.006 | TFLOPs: 31.43 | 7: iteration 83850/ 115203 | consumed samples: 21465600 | consumed tokens: 43961548800 | elapsed time per iteration (s): 0.44 | learning rate: 5.153E-05 | global batch size: 256 | lm loss: 2.220724E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.754 | TFLOPs: 30.47 | 7: iteration 83860/ 115203 | consumed samples: 21468160 | consumed tokens: 43966791680 | elapsed time per iteration (s): 0.43 | learning rate: 5.151E-05 | global batch size: 256 | lm loss: 2.235671E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.569 | TFLOPs: 30.99 | 7: iteration 83870/ 115203 | consumed samples: 21470720 | consumed tokens: 43972034560 | elapsed time per iteration (s): 0.45 | learning rate: 5.149E-05 | global batch size: 256 | lm loss: 2.239584E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.768 | TFLOPs: 30.00 | 7: iteration 83880/ 115203 | consumed samples: 21473280 | consumed tokens: 43977277440 | elapsed time per iteration (s): 0.44 | learning rate: 5.147E-05 | global batch size: 256 | lm loss: 2.230964E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.678 | TFLOPs: 30.83 | 7: iteration 83890/ 115203 | consumed samples: 21475840 | consumed tokens: 43982520320 | elapsed time per iteration (s): 0.43 | learning rate: 5.145E-05 | global batch size: 256 | lm loss: 2.203562E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.831 | TFLOPs: 31.58 | 7: iteration 83900/ 115203 | consumed samples: 21478400 | consumed tokens: 43987763200 | elapsed time per iteration (s): 0.42 | learning rate: 5.144E-05 | global batch size: 256 | lm loss: 2.228537E+00 | grad norm: 0.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.840 | TFLOPs: 31.89 | 7: iteration 83910/ 115203 | consumed samples: 21480960 | consumed tokens: 43993006080 | elapsed time per iteration (s): 0.46 | learning rate: 5.142E-05 | global batch size: 256 | lm loss: 2.254796E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.944 | TFLOPs: 28.91 | 7: iteration 83920/ 115203 | consumed samples: 21483520 | consumed tokens: 43998248960 | elapsed time per iteration (s): 0.43 | learning rate: 5.140E-05 | global batch size: 256 | lm loss: 2.266415E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.805 | TFLOPs: 31.16 | 7: iteration 83930/ 115203 | consumed samples: 21486080 | consumed tokens: 44003491840 | elapsed time per iteration (s): 0.44 | learning rate: 5.138E-05 | global batch size: 256 | lm loss: 2.215309E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.917 | TFLOPs: 30.74 | 7: iteration 83940/ 115203 | consumed samples: 21488640 | consumed tokens: 44008734720 | elapsed time per iteration (s): 0.43 | learning rate: 5.136E-05 | global batch size: 256 | lm loss: 2.258984E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.007 | TFLOPs: 31.32 | 7: iteration 83950/ 115203 | consumed samples: 21491200 | consumed tokens: 44013977600 | elapsed time per iteration (s): 0.44 | learning rate: 5.134E-05 | global batch size: 256 | lm loss: 2.245122E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.890 | TFLOPs: 30.58 | 7: iteration 83960/ 115203 | consumed samples: 21493760 | consumed tokens: 44019220480 | elapsed time per iteration (s): 0.43 | learning rate: 5.132E-05 | global batch size: 256 | lm loss: 2.232011E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.566 | TFLOPs: 31.20 | 7: iteration 83970/ 115203 | consumed samples: 21496320 | consumed tokens: 44024463360 | elapsed time per iteration (s): 0.43 | learning rate: 5.130E-05 | global batch size: 256 | lm loss: 2.232058E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.035 | TFLOPs: 31.33 | 7: iteration 83980/ 115203 | consumed samples: 21498880 | consumed tokens: 44029706240 | elapsed time per iteration (s): 0.44 | learning rate: 5.129E-05 | global batch size: 256 | lm loss: 2.282032E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.158 | TFLOPs: 30.86 | 7: iteration 83990/ 115203 | consumed samples: 21501440 | consumed tokens: 44034949120 | elapsed time per iteration (s): 0.44 | learning rate: 5.127E-05 | global batch size: 256 | lm loss: 2.241154E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.283 | TFLOPs: 30.66 | 0: [2022-11-28 23:04:26,070] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=0, lr=[5.124789271253415e-05, 5.124789271253415e-05, 5.124789271253415e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 84000/ 115203 | consumed samples: 21504000 | consumed tokens: 44040192000 | elapsed time per iteration (s): 0.44 | learning rate: 5.125E-05 | global batch size: 256 | lm loss: 2.225533E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.199 | TFLOPs: 30.76 | 0: steps: 84000 loss: 2.2255 iter time (s): 0.434 samples/sec: 589.335 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 84000 | lm loss value: 2.310587E+00 | lm loss PPL: 1.008034E+01 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 84000 to checkpoints_221m 0: [2022-11-28 23:04:26,246] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step84000 is begin to save! 0: [2022-11-28 23:04:26,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:04:26,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:04:26,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:04:26,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:04:26,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:04:26,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:04:26,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:04:26,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:04:26,454] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:04:26,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:04:26,480] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:04:26,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:04:26,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:04:26,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:04:26,526] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:04:26,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:04:26,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:04:26,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:04:26,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:04:26,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:04:26,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:04:26,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:04:26,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:04:26,647] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:04:26,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:04:26,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:04:26,671] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:04:26,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:04:26,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:04:26,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:04:26,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:04:26,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:04:26,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:04:26,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:04:26,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:04:26,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:04:26,791] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:04:26,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:04:26,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:04:26,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:04:26,820] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step84000/mp_rank_00_model_states.pt 0: [2022-11-28 23:04:26,820] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:04:26,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:04:26,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:04:26,900] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:04:26,900] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,903] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,903] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,903] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 7: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 1: [2022-11-28 23:04:26,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:04:26,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,912] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,912] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,913] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,913] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,913] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,919] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,919] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,920] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,920] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,928] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,928] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,904] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,904] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,905] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,905] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,905] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:04:26,911] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 23:04:26,911] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,940] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:04:26,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:04:26,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 4: [2022-11-28 23:04:26,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: [2022-11-28 23:04:26,959] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:04:26,959] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,994] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 3: [2022-11-28 23:04:26,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:04:26,999] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 23:04:26,999] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,044] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,045] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step84000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 6: [2022-11-28 23:04:27,045] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step84000 is ready now! 0: successfully saved checkpoint at iteration 84000 to checkpoints_221m 7: time (ms) | save-checkpoint: 820.30 7: iteration 84010/ 115203 | consumed samples: 21506560 | consumed tokens: 44045434880 | elapsed time per iteration (s): 0.52 | learning rate: 5.123E-05 | global batch size: 256 | lm loss: 2.235383E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.184 | TFLOPs: 25.67 | 7: iteration 84020/ 115203 | consumed samples: 21509120 | consumed tokens: 44050677760 | elapsed time per iteration (s): 0.43 | learning rate: 5.121E-05 | global batch size: 256 | lm loss: 2.245235E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.335 | TFLOPs: 31.39 | 7: iteration 84030/ 115203 | consumed samples: 21511680 | consumed tokens: 44055920640 | elapsed time per iteration (s): 0.43 | learning rate: 5.119E-05 | global batch size: 256 | lm loss: 2.256852E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.493 | TFLOPs: 30.93 | 7: iteration 84040/ 115203 | consumed samples: 21514240 | consumed tokens: 44061163520 | elapsed time per iteration (s): 0.45 | learning rate: 5.117E-05 | global batch size: 256 | lm loss: 2.258918E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.938 | TFLOPs: 30.01 | 7: iteration 84050/ 115203 | consumed samples: 21516800 | consumed tokens: 44066406400 | elapsed time per iteration (s): 0.64 | learning rate: 5.115E-05 | global batch size: 256 | lm loss: 2.221981E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 400.580 | TFLOPs: 21.02 | 7: iteration 84060/ 115203 | consumed samples: 21519360 | consumed tokens: 44071649280 | elapsed time per iteration (s): 0.46 | learning rate: 5.114E-05 | global batch size: 256 | lm loss: 2.257214E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.110 | TFLOPs: 29.18 | 7: iteration 84070/ 115203 | consumed samples: 21521920 | consumed tokens: 44076892160 | elapsed time per iteration (s): 0.44 | learning rate: 5.112E-05 | global batch size: 256 | lm loss: 2.248913E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.367 | TFLOPs: 30.82 | 7: iteration 84080/ 115203 | consumed samples: 21524480 | consumed tokens: 44082135040 | elapsed time per iteration (s): 0.44 | learning rate: 5.110E-05 | global batch size: 256 | lm loss: 2.260108E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.042 | TFLOPs: 30.75 | 7: iteration 84090/ 115203 | consumed samples: 21527040 | consumed tokens: 44087377920 | elapsed time per iteration (s): 0.43 | learning rate: 5.108E-05 | global batch size: 256 | lm loss: 2.229135E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.171 | TFLOPs: 31.54 | 7: iteration 84100/ 115203 | consumed samples: 21529600 | consumed tokens: 44092620800 | elapsed time per iteration (s): 0.44 | learning rate: 5.106E-05 | global batch size: 256 | lm loss: 2.233077E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.579 | TFLOPs: 30.78 | 7: iteration 84110/ 115203 | consumed samples: 21532160 | consumed tokens: 44097863680 | elapsed time per iteration (s): 0.43 | learning rate: 5.104E-05 | global batch size: 256 | lm loss: 2.241816E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.471 | TFLOPs: 31.09 | 7: iteration 84120/ 115203 | consumed samples: 21534720 | consumed tokens: 44103106560 | elapsed time per iteration (s): 0.45 | learning rate: 5.102E-05 | global batch size: 256 | lm loss: 2.232118E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.798 | TFLOPs: 30.16 | 7: iteration 84130/ 115203 | consumed samples: 21537280 | consumed tokens: 44108349440 | elapsed time per iteration (s): 0.43 | learning rate: 5.100E-05 | global batch size: 256 | lm loss: 2.239169E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.235 | TFLOPs: 31.02 | 7: iteration 84140/ 115203 | consumed samples: 21539840 | consumed tokens: 44113592320 | elapsed time per iteration (s): 0.43 | learning rate: 5.099E-05 | global batch size: 256 | lm loss: 2.260109E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.725 | TFLOPs: 31.15 | 7: iteration 84150/ 115203 | consumed samples: 21542400 | consumed tokens: 44118835200 | elapsed time per iteration (s): 0.44 | learning rate: 5.097E-05 | global batch size: 256 | lm loss: 2.258624E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.492 | TFLOPs: 30.72 | 7: iteration 84160/ 115203 | consumed samples: 21544960 | consumed tokens: 44124078080 | elapsed time per iteration (s): 0.43 | learning rate: 5.095E-05 | global batch size: 256 | lm loss: 2.221924E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.938 | TFLOPs: 31.43 | 7: iteration 84170/ 115203 | consumed samples: 21547520 | consumed tokens: 44129320960 | elapsed time per iteration (s): 0.44 | learning rate: 5.093E-05 | global batch size: 256 | lm loss: 2.240041E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.359 | TFLOPs: 30.56 | 7: iteration 84180/ 115203 | consumed samples: 21550080 | consumed tokens: 44134563840 | elapsed time per iteration (s): 0.43 | learning rate: 5.091E-05 | global batch size: 256 | lm loss: 2.251697E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.460 | TFLOPs: 30.93 | 7: iteration 84190/ 115203 | consumed samples: 21552640 | consumed tokens: 44139806720 | elapsed time per iteration (s): 0.44 | learning rate: 5.089E-05 | global batch size: 256 | lm loss: 2.227945E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.911 | TFLOPs: 30.85 | 7: iteration 84200/ 115203 | consumed samples: 21555200 | consumed tokens: 44145049600 | elapsed time per iteration (s): 0.44 | learning rate: 5.087E-05 | global batch size: 256 | lm loss: 2.267995E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.692 | TFLOPs: 30.68 | 7: iteration 84210/ 115203 | consumed samples: 21557760 | consumed tokens: 44150292480 | elapsed time per iteration (s): 0.43 | learning rate: 5.085E-05 | global batch size: 256 | lm loss: 2.238181E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.720 | TFLOPs: 31.41 | 7: iteration 84220/ 115203 | consumed samples: 21560320 | consumed tokens: 44155535360 | elapsed time per iteration (s): 0.42 | learning rate: 5.084E-05 | global batch size: 256 | lm loss: 2.224961E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.105 | TFLOPs: 31.85 | 7: iteration 84230/ 115203 | consumed samples: 21562880 | consumed tokens: 44160778240 | elapsed time per iteration (s): 0.44 | learning rate: 5.082E-05 | global batch size: 256 | lm loss: 2.274173E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.740 | TFLOPs: 30.52 | 7: iteration 84240/ 115203 | consumed samples: 21565440 | consumed tokens: 44166021120 | elapsed time per iteration (s): 0.44 | learning rate: 5.080E-05 | global batch size: 256 | lm loss: 2.246582E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.815 | TFLOPs: 30.84 | 7: iteration 84250/ 115203 | consumed samples: 21568000 | consumed tokens: 44171264000 | elapsed time per iteration (s): 0.44 | learning rate: 5.078E-05 | global batch size: 256 | lm loss: 2.227519E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.849 | TFLOPs: 30.48 | 7: iteration 84260/ 115203 | consumed samples: 21570560 | consumed tokens: 44176506880 | elapsed time per iteration (s): 0.44 | learning rate: 5.076E-05 | global batch size: 256 | lm loss: 2.250959E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.971 | TFLOPs: 30.48 | 7: iteration 84270/ 115203 | consumed samples: 21573120 | consumed tokens: 44181749760 | elapsed time per iteration (s): 0.43 | learning rate: 5.074E-05 | global batch size: 256 | lm loss: 2.249979E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.337 | TFLOPs: 31.24 | 7: iteration 84280/ 115203 | consumed samples: 21575680 | consumed tokens: 44186992640 | elapsed time per iteration (s): 0.43 | learning rate: 5.072E-05 | global batch size: 256 | lm loss: 2.256210E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.152 | TFLOPs: 31.02 | 7: iteration 84290/ 115203 | consumed samples: 21578240 | consumed tokens: 44192235520 | elapsed time per iteration (s): 0.43 | learning rate: 5.071E-05 | global batch size: 256 | lm loss: 2.264009E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.655 | TFLOPs: 31.20 | 7: iteration 84300/ 115203 | consumed samples: 21580800 | consumed tokens: 44197478400 | elapsed time per iteration (s): 0.43 | learning rate: 5.069E-05 | global batch size: 256 | lm loss: 2.265013E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.163 | TFLOPs: 31.02 | 7: iteration 84310/ 115203 | consumed samples: 21583360 | consumed tokens: 44202721280 | elapsed time per iteration (s): 0.44 | learning rate: 5.067E-05 | global batch size: 256 | lm loss: 2.257117E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.262 | TFLOPs: 30.55 | 7: iteration 84320/ 115203 | consumed samples: 21585920 | consumed tokens: 44207964160 | elapsed time per iteration (s): 0.44 | learning rate: 5.065E-05 | global batch size: 256 | lm loss: 2.249759E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.743 | TFLOPs: 30.63 | 7: iteration 84330/ 115203 | consumed samples: 21588480 | consumed tokens: 44213207040 | elapsed time per iteration (s): 0.43 | learning rate: 5.063E-05 | global batch size: 256 | lm loss: 2.225369E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.320 | TFLOPs: 30.92 | 7: iteration 84340/ 115203 | consumed samples: 21591040 | consumed tokens: 44218449920 | elapsed time per iteration (s): 0.44 | learning rate: 5.061E-05 | global batch size: 256 | lm loss: 2.252147E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.451 | TFLOPs: 30.51 | 7: iteration 84350/ 115203 | consumed samples: 21593600 | consumed tokens: 44223692800 | elapsed time per iteration (s): 0.44 | learning rate: 5.059E-05 | global batch size: 256 | lm loss: 2.221232E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.042 | TFLOPs: 30.43 | 7: iteration 84360/ 115203 | consumed samples: 21596160 | consumed tokens: 44228935680 | elapsed time per iteration (s): 0.44 | learning rate: 5.057E-05 | global batch size: 256 | lm loss: 2.235539E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.787 | TFLOPs: 30.32 | 7: iteration 84370/ 115203 | consumed samples: 21598720 | consumed tokens: 44234178560 | elapsed time per iteration (s): 0.43 | learning rate: 5.056E-05 | global batch size: 256 | lm loss: 2.262648E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.821 | TFLOPs: 31.05 | 7: iteration 84380/ 115203 | consumed samples: 21601280 | consumed tokens: 44239421440 | elapsed time per iteration (s): 0.43 | learning rate: 5.054E-05 | global batch size: 256 | lm loss: 2.271715E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.241 | TFLOPs: 31.07 | 7: iteration 84390/ 115203 | consumed samples: 21603840 | consumed tokens: 44244664320 | elapsed time per iteration (s): 0.44 | learning rate: 5.052E-05 | global batch size: 256 | lm loss: 2.260159E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.827 | TFLOPs: 30.74 | 7: iteration 84400/ 115203 | consumed samples: 21606400 | consumed tokens: 44249907200 | elapsed time per iteration (s): 0.44 | learning rate: 5.050E-05 | global batch size: 256 | lm loss: 2.252891E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.033 | TFLOPs: 30.28 | 7: iteration 84410/ 115203 | consumed samples: 21608960 | consumed tokens: 44255150080 | elapsed time per iteration (s): 0.45 | learning rate: 5.048E-05 | global batch size: 256 | lm loss: 2.238466E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.490 | TFLOPs: 30.04 | 7: iteration 84420/ 115203 | consumed samples: 21611520 | consumed tokens: 44260392960 | elapsed time per iteration (s): 0.47 | learning rate: 5.046E-05 | global batch size: 256 | lm loss: 2.238937E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.027 | TFLOPs: 28.86 | 7: iteration 84430/ 115203 | consumed samples: 21614080 | consumed tokens: 44265635840 | elapsed time per iteration (s): 0.43 | learning rate: 5.044E-05 | global batch size: 256 | lm loss: 2.264274E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.697 | TFLOPs: 31.15 | 7: iteration 84440/ 115203 | consumed samples: 21616640 | consumed tokens: 44270878720 | elapsed time per iteration (s): 0.43 | learning rate: 5.043E-05 | global batch size: 256 | lm loss: 2.222794E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.095 | TFLOPs: 30.91 | 7: iteration 84450/ 115203 | consumed samples: 21619200 | consumed tokens: 44276121600 | elapsed time per iteration (s): 0.44 | learning rate: 5.041E-05 | global batch size: 256 | lm loss: 2.245837E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.994 | TFLOPs: 30.38 | 7: iteration 84460/ 115203 | consumed samples: 21621760 | consumed tokens: 44281364480 | elapsed time per iteration (s): 0.43 | learning rate: 5.039E-05 | global batch size: 256 | lm loss: 2.245791E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.786 | TFLOPs: 31.31 | 7: iteration 84470/ 115203 | consumed samples: 21624320 | consumed tokens: 44286607360 | elapsed time per iteration (s): 0.43 | learning rate: 5.037E-05 | global batch size: 256 | lm loss: 2.246590E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.011 | TFLOPs: 31.17 | 7: iteration 84480/ 115203 | consumed samples: 21626880 | consumed tokens: 44291850240 | elapsed time per iteration (s): 0.43 | learning rate: 5.035E-05 | global batch size: 256 | lm loss: 2.215186E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.838 | TFLOPs: 31.00 | 7: iteration 84490/ 115203 | consumed samples: 21629440 | consumed tokens: 44297093120 | elapsed time per iteration (s): 0.43 | learning rate: 5.033E-05 | global batch size: 256 | lm loss: 2.246688E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.454 | TFLOPs: 30.98 | 7: iteration 84500/ 115203 | consumed samples: 21632000 | consumed tokens: 44302336000 | elapsed time per iteration (s): 0.43 | learning rate: 5.031E-05 | global batch size: 256 | lm loss: 2.215348E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.141 | TFLOPs: 31.38 | 7: iteration 84510/ 115203 | consumed samples: 21634560 | consumed tokens: 44307578880 | elapsed time per iteration (s): 0.44 | learning rate: 5.030E-05 | global batch size: 256 | lm loss: 2.229245E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.602 | TFLOPs: 30.73 | 7: iteration 84520/ 115203 | consumed samples: 21637120 | consumed tokens: 44312821760 | elapsed time per iteration (s): 0.43 | learning rate: 5.028E-05 | global batch size: 256 | lm loss: 2.282437E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.794 | TFLOPs: 31.26 | 7: iteration 84530/ 115203 | consumed samples: 21639680 | consumed tokens: 44318064640 | elapsed time per iteration (s): 0.43 | learning rate: 5.026E-05 | global batch size: 256 | lm loss: 2.266348E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.139 | TFLOPs: 31.02 | 7: iteration 84540/ 115203 | consumed samples: 21642240 | consumed tokens: 44323307520 | elapsed time per iteration (s): 0.43 | learning rate: 5.024E-05 | global batch size: 256 | lm loss: 2.237427E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.911 | TFLOPs: 31.21 | 7: iteration 84550/ 115203 | consumed samples: 21644800 | consumed tokens: 44328550400 | elapsed time per iteration (s): 0.44 | learning rate: 5.022E-05 | global batch size: 256 | lm loss: 2.224268E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.739 | TFLOPs: 30.37 | 7: iteration 84560/ 115203 | consumed samples: 21647360 | consumed tokens: 44333793280 | elapsed time per iteration (s): 0.43 | learning rate: 5.020E-05 | global batch size: 256 | lm loss: 2.210462E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | 7: iteration 84570/ 115203 | consumed samples: 21649920 | consumed tokens: 44339036160 | elapsed time per iteration (s): 0.43 | learning rate: 5.018E-05 | global batch size: 256 | lm loss: 2.223546E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.159 | TFLOPs: 31.12 | 7: iteration 84580/ 115203 | consumed samples: 21652480 | consumed tokens: 44344279040 | elapsed time per iteration (s): 0.43 | learning rate: 5.017E-05 | global batch size: 256 | lm loss: 2.228820E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.925 | TFLOPs: 31.11 | 7: iteration 84590/ 115203 | consumed samples: 21655040 | consumed tokens: 44349521920 | elapsed time per iteration (s): 0.43 | learning rate: 5.015E-05 | global batch size: 256 | lm loss: 2.242048E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.525 | TFLOPs: 31.04 | 7: iteration 84600/ 115203 | consumed samples: 21657600 | consumed tokens: 44354764800 | elapsed time per iteration (s): 0.44 | learning rate: 5.013E-05 | global batch size: 256 | lm loss: 2.231622E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.171 | TFLOPs: 30.39 | 7: iteration 84610/ 115203 | consumed samples: 21660160 | consumed tokens: 44360007680 | elapsed time per iteration (s): 0.45 | learning rate: 5.011E-05 | global batch size: 256 | lm loss: 2.223550E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.244 | TFLOPs: 30.08 | 7: iteration 84620/ 115203 | consumed samples: 21662720 | consumed tokens: 44365250560 | elapsed time per iteration (s): 0.44 | learning rate: 5.009E-05 | global batch size: 256 | lm loss: 2.243203E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.745 | TFLOPs: 30.58 | 7: iteration 84630/ 115203 | consumed samples: 21665280 | consumed tokens: 44370493440 | elapsed time per iteration (s): 0.44 | learning rate: 5.007E-05 | global batch size: 256 | lm loss: 2.223190E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.498 | TFLOPs: 30.83 | 7: iteration 84640/ 115203 | consumed samples: 21667840 | consumed tokens: 44375736320 | elapsed time per iteration (s): 0.43 | learning rate: 5.006E-05 | global batch size: 256 | lm loss: 2.228686E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.398 | TFLOPs: 31.13 | 7: iteration 84650/ 115203 | consumed samples: 21670400 | consumed tokens: 44380979200 | elapsed time per iteration (s): 0.43 | learning rate: 5.004E-05 | global batch size: 256 | lm loss: 2.232561E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.785 | TFLOPs: 30.89 | 7: iteration 84660/ 115203 | consumed samples: 21672960 | consumed tokens: 44386222080 | elapsed time per iteration (s): 0.43 | learning rate: 5.002E-05 | global batch size: 256 | lm loss: 2.234219E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.979 | TFLOPs: 31.22 | 7: iteration 84670/ 115203 | consumed samples: 21675520 | consumed tokens: 44391464960 | elapsed time per iteration (s): 0.43 | learning rate: 5.000E-05 | global batch size: 256 | lm loss: 2.247195E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.568 | TFLOPs: 30.93 | 7: iteration 84680/ 115203 | consumed samples: 21678080 | consumed tokens: 44396707840 | elapsed time per iteration (s): 0.43 | learning rate: 4.998E-05 | global batch size: 256 | lm loss: 2.229629E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.543 | TFLOPs: 31.40 | 7: iteration 84690/ 115203 | consumed samples: 21680640 | consumed tokens: 44401950720 | elapsed time per iteration (s): 0.43 | learning rate: 4.996E-05 | global batch size: 256 | lm loss: 2.237295E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.783 | TFLOPs: 31.47 | 7: iteration 84700/ 115203 | consumed samples: 21683200 | consumed tokens: 44407193600 | elapsed time per iteration (s): 0.43 | learning rate: 4.994E-05 | global batch size: 256 | lm loss: 2.255452E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.575 | TFLOPs: 30.93 | 7: iteration 84710/ 115203 | consumed samples: 21685760 | consumed tokens: 44412436480 | elapsed time per iteration (s): 0.43 | learning rate: 4.993E-05 | global batch size: 256 | lm loss: 2.258928E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.872 | TFLOPs: 31.21 | 7: iteration 84720/ 115203 | consumed samples: 21688320 | consumed tokens: 44417679360 | elapsed time per iteration (s): 0.43 | learning rate: 4.991E-05 | global batch size: 256 | lm loss: 2.227054E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.294 | TFLOPs: 31.39 | 7: iteration 84730/ 115203 | consumed samples: 21690880 | consumed tokens: 44422922240 | elapsed time per iteration (s): 0.44 | learning rate: 4.989E-05 | global batch size: 256 | lm loss: 2.227399E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.148 | TFLOPs: 30.65 | 7: iteration 84740/ 115203 | consumed samples: 21693440 | consumed tokens: 44428165120 | elapsed time per iteration (s): 0.44 | learning rate: 4.987E-05 | global batch size: 256 | lm loss: 2.230686E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.936 | TFLOPs: 30.69 | 7: iteration 84750/ 115203 | consumed samples: 21696000 | consumed tokens: 44433408000 | elapsed time per iteration (s): 0.43 | learning rate: 4.985E-05 | global batch size: 256 | lm loss: 2.246426E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.107 | TFLOPs: 31.01 | 7: iteration 84760/ 115203 | consumed samples: 21698560 | consumed tokens: 44438650880 | elapsed time per iteration (s): 0.44 | learning rate: 4.983E-05 | global batch size: 256 | lm loss: 2.248177E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.954 | TFLOPs: 30.27 | 7: iteration 84770/ 115203 | consumed samples: 21701120 | consumed tokens: 44443893760 | elapsed time per iteration (s): 0.44 | learning rate: 4.982E-05 | global batch size: 256 | lm loss: 2.250521E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.525 | TFLOPs: 30.51 | 7: iteration 84780/ 115203 | consumed samples: 21703680 | consumed tokens: 44449136640 | elapsed time per iteration (s): 0.44 | learning rate: 4.980E-05 | global batch size: 256 | lm loss: 2.253410E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.989 | TFLOPs: 30.80 | 7: iteration 84790/ 115203 | consumed samples: 21706240 | consumed tokens: 44454379520 | elapsed time per iteration (s): 0.43 | learning rate: 4.978E-05 | global batch size: 256 | lm loss: 2.285842E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.816 | TFLOPs: 30.95 | 7: iteration 84800/ 115203 | consumed samples: 21708800 | consumed tokens: 44459622400 | elapsed time per iteration (s): 0.42 | learning rate: 4.976E-05 | global batch size: 256 | lm loss: 2.227674E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.922 | TFLOPs: 31.69 | 7: iteration 84810/ 115203 | consumed samples: 21711360 | consumed tokens: 44464865280 | elapsed time per iteration (s): 0.44 | learning rate: 4.974E-05 | global batch size: 256 | lm loss: 2.244163E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.911 | TFLOPs: 30.43 | 7: iteration 84820/ 115203 | consumed samples: 21713920 | consumed tokens: 44470108160 | elapsed time per iteration (s): 0.44 | learning rate: 4.972E-05 | global batch size: 256 | lm loss: 2.222115E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.698 | TFLOPs: 30.68 | 7: iteration 84830/ 115203 | consumed samples: 21716480 | consumed tokens: 44475351040 | elapsed time per iteration (s): 0.44 | learning rate: 4.970E-05 | global batch size: 256 | lm loss: 2.251277E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.847 | TFLOPs: 30.53 | 7: iteration 84840/ 115203 | consumed samples: 21719040 | consumed tokens: 44480593920 | elapsed time per iteration (s): 0.44 | learning rate: 4.969E-05 | global batch size: 256 | lm loss: 2.228862E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.154 | TFLOPs: 30.54 | 7: iteration 84850/ 115203 | consumed samples: 21721600 | consumed tokens: 44485836800 | elapsed time per iteration (s): 0.43 | learning rate: 4.967E-05 | global batch size: 256 | lm loss: 2.247366E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.115 | TFLOPs: 31.01 | 7: iteration 84860/ 115203 | consumed samples: 21724160 | consumed tokens: 44491079680 | elapsed time per iteration (s): 0.44 | learning rate: 4.965E-05 | global batch size: 256 | lm loss: 2.251979E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.665 | TFLOPs: 30.73 | 7: iteration 84870/ 115203 | consumed samples: 21726720 | consumed tokens: 44496322560 | elapsed time per iteration (s): 0.43 | learning rate: 4.963E-05 | global batch size: 256 | lm loss: 2.236393E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.030 | TFLOPs: 30.91 | 7: iteration 84880/ 115203 | consumed samples: 21729280 | consumed tokens: 44501565440 | elapsed time per iteration (s): 0.43 | learning rate: 4.961E-05 | global batch size: 256 | lm loss: 2.249372E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.953 | TFLOPs: 30.95 | 7: iteration 84890/ 115203 | consumed samples: 21731840 | consumed tokens: 44506808320 | elapsed time per iteration (s): 0.44 | learning rate: 4.959E-05 | global batch size: 256 | lm loss: 2.234315E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.308 | TFLOPs: 30.61 | 7: iteration 84900/ 115203 | consumed samples: 21734400 | consumed tokens: 44512051200 | elapsed time per iteration (s): 0.44 | learning rate: 4.958E-05 | global batch size: 256 | lm loss: 2.247196E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.899 | TFLOPs: 30.79 | 7: iteration 84910/ 115203 | consumed samples: 21736960 | consumed tokens: 44517294080 | elapsed time per iteration (s): 0.44 | learning rate: 4.956E-05 | global batch size: 256 | lm loss: 2.243575E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.210 | TFLOPs: 30.50 | 7: iteration 84920/ 115203 | consumed samples: 21739520 | consumed tokens: 44522536960 | elapsed time per iteration (s): 0.44 | learning rate: 4.954E-05 | global batch size: 256 | lm loss: 2.227078E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.228 | TFLOPs: 30.29 | 7: iteration 84930/ 115203 | consumed samples: 21742080 | consumed tokens: 44527779840 | elapsed time per iteration (s): 0.43 | learning rate: 4.952E-05 | global batch size: 256 | lm loss: 2.257600E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.717 | TFLOPs: 30.94 | 7: iteration 84940/ 115203 | consumed samples: 21744640 | consumed tokens: 44533022720 | elapsed time per iteration (s): 0.44 | learning rate: 4.950E-05 | global batch size: 256 | lm loss: 2.256790E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.857 | TFLOPs: 30.58 | 7: iteration 84950/ 115203 | consumed samples: 21747200 | consumed tokens: 44538265600 | elapsed time per iteration (s): 0.44 | learning rate: 4.948E-05 | global batch size: 256 | lm loss: 2.260225E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.933 | TFLOPs: 30.64 | 7: iteration 84960/ 115203 | consumed samples: 21749760 | consumed tokens: 44543508480 | elapsed time per iteration (s): 0.43 | learning rate: 4.947E-05 | global batch size: 256 | lm loss: 2.207372E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.997 | TFLOPs: 31.48 | 7: iteration 84970/ 115203 | consumed samples: 21752320 | consumed tokens: 44548751360 | elapsed time per iteration (s): 0.43 | learning rate: 4.945E-05 | global batch size: 256 | lm loss: 2.243943E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.919 | TFLOPs: 31.32 | 7: iteration 84980/ 115203 | consumed samples: 21754880 | consumed tokens: 44553994240 | elapsed time per iteration (s): 0.43 | learning rate: 4.943E-05 | global batch size: 256 | lm loss: 2.257504E+00 | grad norm: 0.222 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.279 | TFLOPs: 31.23 | 7: iteration 84990/ 115203 | consumed samples: 21757440 | consumed tokens: 44559237120 | elapsed time per iteration (s): 0.43 | learning rate: 4.941E-05 | global batch size: 256 | lm loss: 2.232543E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.854 | TFLOPs: 31.16 | 7: iteration 85000/ 115203 | consumed samples: 21760000 | consumed tokens: 44564480000 | elapsed time per iteration (s): 0.44 | learning rate: 4.939E-05 | global batch size: 256 | lm loss: 2.269350E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.566 | TFLOPs: 30.78 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 85000 | lm loss value: 2.171110E+00 | lm loss PPL: 8.768015E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 85000 to checkpoints_221m 0: [2022-11-28 23:11:44,826] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step85000 is begin to save! 0: [2022-11-28 23:11:44,852] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:11:44,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:11:44,964] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:11:44,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:11:44,985] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:11:45,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:11:45,009] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:11:45,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:11:45,032] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:11:45,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:11:45,056] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:11:45,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:11:45,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:11:45,103] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:11:45,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:11:45,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:11:45,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:11:45,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:11:45,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:11:45,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:11:45,176] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:11:45,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:11:45,200] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:11:45,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:11:45,224] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:11:45,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:11:45,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:11:45,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:11:45,270] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:11:45,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:11:45,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:11:45,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:11:45,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:11:45,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:11:45,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:11:45,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:11:45,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:11:45,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:11:45,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:11:45,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:11:45,391] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step85000/mp_rank_00_model_states.pt 0: [2022-11-28 23:11:45,391] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:11:45,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:11:45,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,461] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,461] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,463] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,463] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,463] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,470] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 4: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:11:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:11:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:11:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 7: [2022-11-28 23:11:45,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,477] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,477] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,478] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,478] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 1: [2022-11-28 23:11:45,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:11:45,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:11:45,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:11:45,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 6: [2022-11-28 23:11:45,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,486] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,488] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,488] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,488] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:11:45,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 5: [2022-11-28 23:11:45,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:11:45,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 23:11:45,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:11:45,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:11:45,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: [2022-11-28 23:11:45,527] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:11:45,527] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:11:45,710] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step85000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:11:45,710] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step85000 is ready now! 0: successfully saved checkpoint at iteration 85000 to checkpoints_221m 7: time (ms) | save-checkpoint: 906.14 7: iteration 85010/ 115203 | consumed samples: 21762560 | consumed tokens: 44569722880 | elapsed time per iteration (s): 0.55 | learning rate: 4.937E-05 | global batch size: 256 | lm loss: 2.255418E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 469.630 | TFLOPs: 24.64 | 7: iteration 85020/ 115203 | consumed samples: 21765120 | consumed tokens: 44574965760 | elapsed time per iteration (s): 0.43 | learning rate: 4.936E-05 | global batch size: 256 | lm loss: 2.243149E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.589 | TFLOPs: 30.93 | 7: iteration 85030/ 115203 | consumed samples: 21767680 | consumed tokens: 44580208640 | elapsed time per iteration (s): 0.43 | learning rate: 4.934E-05 | global batch size: 256 | lm loss: 2.256687E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.982 | TFLOPs: 31.11 | 7: iteration 85040/ 115203 | consumed samples: 21770240 | consumed tokens: 44585451520 | elapsed time per iteration (s): 0.45 | learning rate: 4.932E-05 | global batch size: 256 | lm loss: 2.249997E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.354 | TFLOPs: 30.14 | 7: iteration 85050/ 115203 | consumed samples: 21772800 | consumed tokens: 44590694400 | elapsed time per iteration (s): 0.64 | learning rate: 4.930E-05 | global batch size: 256 | lm loss: 2.180896E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 402.640 | TFLOPs: 21.13 | 7: iteration 85060/ 115203 | consumed samples: 21775360 | consumed tokens: 44595937280 | elapsed time per iteration (s): 0.44 | learning rate: 4.928E-05 | global batch size: 256 | lm loss: 2.236417E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.856 | TFLOPs: 30.69 | 7: iteration 85070/ 115203 | consumed samples: 21777920 | consumed tokens: 44601180160 | elapsed time per iteration (s): 0.43 | learning rate: 4.926E-05 | global batch size: 256 | lm loss: 2.247272E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.302 | TFLOPs: 31.39 | 7: iteration 85080/ 115203 | consumed samples: 21780480 | consumed tokens: 44606423040 | elapsed time per iteration (s): 0.44 | learning rate: 4.925E-05 | global batch size: 256 | lm loss: 2.196152E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.685 | TFLOPs: 30.47 | 7: iteration 85090/ 115203 | consumed samples: 21783040 | consumed tokens: 44611665920 | elapsed time per iteration (s): 0.43 | learning rate: 4.923E-05 | global batch size: 256 | lm loss: 2.244057E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.002 | TFLOPs: 30.90 | 7: iteration 85100/ 115203 | consumed samples: 21785600 | consumed tokens: 44616908800 | elapsed time per iteration (s): 0.44 | learning rate: 4.921E-05 | global batch size: 256 | lm loss: 2.237212E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.628 | TFLOPs: 30.31 | 7: iteration 85110/ 115203 | consumed samples: 21788160 | consumed tokens: 44622151680 | elapsed time per iteration (s): 0.44 | learning rate: 4.919E-05 | global batch size: 256 | lm loss: 2.223628E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.910 | TFLOPs: 30.58 | 7: iteration 85120/ 115203 | consumed samples: 21790720 | consumed tokens: 44627394560 | elapsed time per iteration (s): 0.43 | learning rate: 4.917E-05 | global batch size: 256 | lm loss: 2.224012E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.493 | TFLOPs: 31.30 | 7: iteration 85130/ 115203 | consumed samples: 21793280 | consumed tokens: 44632637440 | elapsed time per iteration (s): 0.43 | learning rate: 4.915E-05 | global batch size: 256 | lm loss: 2.248293E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.789 | TFLOPs: 31.00 | 7: iteration 85140/ 115203 | consumed samples: 21795840 | consumed tokens: 44637880320 | elapsed time per iteration (s): 0.43 | learning rate: 4.914E-05 | global batch size: 256 | lm loss: 2.229071E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.912 | TFLOPs: 31.06 | 7: iteration 85150/ 115203 | consumed samples: 21798400 | consumed tokens: 44643123200 | elapsed time per iteration (s): 0.45 | learning rate: 4.912E-05 | global batch size: 256 | lm loss: 2.242936E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.221 | TFLOPs: 30.18 | 7: iteration 85160/ 115203 | consumed samples: 21800960 | consumed tokens: 44648366080 | elapsed time per iteration (s): 0.44 | learning rate: 4.910E-05 | global batch size: 256 | lm loss: 2.256745E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.252 | TFLOPs: 30.60 | 7: iteration 85170/ 115203 | consumed samples: 21803520 | consumed tokens: 44653608960 | elapsed time per iteration (s): 0.43 | learning rate: 4.908E-05 | global batch size: 256 | lm loss: 2.221572E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.986 | TFLOPs: 31.53 | 7: iteration 85180/ 115203 | consumed samples: 21806080 | consumed tokens: 44658851840 | elapsed time per iteration (s): 0.43 | learning rate: 4.906E-05 | global batch size: 256 | lm loss: 2.225125E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.404 | TFLOPs: 31.55 | 7: iteration 85190/ 115203 | consumed samples: 21808640 | consumed tokens: 44664094720 | elapsed time per iteration (s): 0.43 | learning rate: 4.905E-05 | global batch size: 256 | lm loss: 2.225617E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.498 | TFLOPs: 31.19 | 7: iteration 85200/ 115203 | consumed samples: 21811200 | consumed tokens: 44669337600 | elapsed time per iteration (s): 0.44 | learning rate: 4.903E-05 | global batch size: 256 | lm loss: 2.262819E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.426 | TFLOPs: 30.87 | 7: iteration 85210/ 115203 | consumed samples: 21813760 | consumed tokens: 44674580480 | elapsed time per iteration (s): 0.43 | learning rate: 4.901E-05 | global batch size: 256 | lm loss: 2.240277E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.445 | TFLOPs: 31.08 | 7: iteration 85220/ 115203 | consumed samples: 21816320 | consumed tokens: 44679823360 | elapsed time per iteration (s): 0.43 | learning rate: 4.899E-05 | global batch size: 256 | lm loss: 2.269743E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.025 | TFLOPs: 31.32 | 7: iteration 85230/ 115203 | consumed samples: 21818880 | consumed tokens: 44685066240 | elapsed time per iteration (s): 0.43 | learning rate: 4.897E-05 | global batch size: 256 | lm loss: 2.245368E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.416 | TFLOPs: 31.24 | 7: iteration 85240/ 115203 | consumed samples: 21821440 | consumed tokens: 44690309120 | elapsed time per iteration (s): 0.45 | learning rate: 4.895E-05 | global batch size: 256 | lm loss: 2.245371E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.769 | TFLOPs: 29.95 | 7: iteration 85250/ 115203 | consumed samples: 21824000 | consumed tokens: 44695552000 | elapsed time per iteration (s): 0.44 | learning rate: 4.894E-05 | global batch size: 256 | lm loss: 2.232703E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.005 | TFLOPs: 30.85 | 7: iteration 85260/ 115203 | consumed samples: 21826560 | consumed tokens: 44700794880 | elapsed time per iteration (s): 0.43 | learning rate: 4.892E-05 | global batch size: 256 | lm loss: 2.262992E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.855 | TFLOPs: 31.26 | 7: iteration 85270/ 115203 | consumed samples: 21829120 | consumed tokens: 44706037760 | elapsed time per iteration (s): 0.43 | learning rate: 4.890E-05 | global batch size: 256 | lm loss: 2.236382E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.656 | TFLOPs: 31.57 | 7: iteration 85280/ 115203 | consumed samples: 21831680 | consumed tokens: 44711280640 | elapsed time per iteration (s): 0.45 | learning rate: 4.888E-05 | global batch size: 256 | lm loss: 2.226039E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.784 | TFLOPs: 29.58 | 7: iteration 85290/ 115203 | consumed samples: 21834240 | consumed tokens: 44716523520 | elapsed time per iteration (s): 0.43 | learning rate: 4.886E-05 | global batch size: 256 | lm loss: 2.247954E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.474 | TFLOPs: 31.19 | 7: iteration 85300/ 115203 | consumed samples: 21836800 | consumed tokens: 44721766400 | elapsed time per iteration (s): 0.44 | learning rate: 4.884E-05 | global batch size: 256 | lm loss: 2.239303E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.611 | TFLOPs: 30.57 | 7: iteration 85310/ 115203 | consumed samples: 21839360 | consumed tokens: 44727009280 | elapsed time per iteration (s): 0.43 | learning rate: 4.883E-05 | global batch size: 256 | lm loss: 2.238989E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.801 | TFLOPs: 31.21 | 7: iteration 85320/ 115203 | consumed samples: 21841920 | consumed tokens: 44732252160 | elapsed time per iteration (s): 0.43 | learning rate: 4.881E-05 | global batch size: 256 | lm loss: 2.250889E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.172 | TFLOPs: 31.18 | 7: iteration 85330/ 115203 | consumed samples: 21844480 | consumed tokens: 44737495040 | elapsed time per iteration (s): 0.43 | learning rate: 4.879E-05 | global batch size: 256 | lm loss: 2.243842E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.784 | TFLOPs: 31.05 | 7: iteration 85340/ 115203 | consumed samples: 21847040 | consumed tokens: 44742737920 | elapsed time per iteration (s): 0.43 | learning rate: 4.877E-05 | global batch size: 256 | lm loss: 2.251773E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.128 | TFLOPs: 31.02 | 7: iteration 85350/ 115203 | consumed samples: 21849600 | consumed tokens: 44747980800 | elapsed time per iteration (s): 0.43 | learning rate: 4.875E-05 | global batch size: 256 | lm loss: 2.261909E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.220 | TFLOPs: 31.44 | 7: iteration 85360/ 115203 | consumed samples: 21852160 | consumed tokens: 44753223680 | elapsed time per iteration (s): 0.43 | learning rate: 4.874E-05 | global batch size: 256 | lm loss: 2.203750E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.638 | TFLOPs: 31.41 | 7: iteration 85370/ 115203 | consumed samples: 21854720 | consumed tokens: 44758466560 | elapsed time per iteration (s): 0.42 | learning rate: 4.872E-05 | global batch size: 256 | lm loss: 2.234576E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.136 | TFLOPs: 31.70 | 7: iteration 85380/ 115203 | consumed samples: 21857280 | consumed tokens: 44763709440 | elapsed time per iteration (s): 0.42 | learning rate: 4.870E-05 | global batch size: 256 | lm loss: 2.246423E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.076 | TFLOPs: 31.85 | 7: iteration 85390/ 115203 | consumed samples: 21859840 | consumed tokens: 44768952320 | elapsed time per iteration (s): 0.43 | learning rate: 4.868E-05 | global batch size: 256 | lm loss: 2.257793E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.749 | TFLOPs: 31.47 | 7: iteration 85400/ 115203 | consumed samples: 21862400 | consumed tokens: 44774195200 | elapsed time per iteration (s): 0.42 | learning rate: 4.866E-05 | global batch size: 256 | lm loss: 2.227349E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.170 | TFLOPs: 31.86 | 7: iteration 85410/ 115203 | consumed samples: 21864960 | consumed tokens: 44779438080 | elapsed time per iteration (s): 0.42 | learning rate: 4.864E-05 | global batch size: 256 | lm loss: 2.216138E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.199 | TFLOPs: 31.86 | 7: iteration 85420/ 115203 | consumed samples: 21867520 | consumed tokens: 44784680960 | elapsed time per iteration (s): 0.43 | learning rate: 4.863E-05 | global batch size: 256 | lm loss: 2.236328E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.491 | TFLOPs: 31.40 | 7: iteration 85430/ 115203 | consumed samples: 21870080 | consumed tokens: 44789923840 | elapsed time per iteration (s): 0.44 | learning rate: 4.861E-05 | global batch size: 256 | lm loss: 2.245910E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.157 | TFLOPs: 30.86 | 7: iteration 85440/ 115203 | consumed samples: 21872640 | consumed tokens: 44795166720 | elapsed time per iteration (s): 0.43 | learning rate: 4.859E-05 | global batch size: 256 | lm loss: 2.236831E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.227 | TFLOPs: 31.07 | 7: iteration 85450/ 115203 | consumed samples: 21875200 | consumed tokens: 44800409600 | elapsed time per iteration (s): 0.42 | learning rate: 4.857E-05 | global batch size: 256 | lm loss: 2.234282E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.513 | TFLOPs: 31.72 | 7: iteration 85460/ 115203 | consumed samples: 21877760 | consumed tokens: 44805652480 | elapsed time per iteration (s): 0.43 | learning rate: 4.855E-05 | global batch size: 256 | lm loss: 2.263363E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.108 | TFLOPs: 31.17 | 7: iteration 85470/ 115203 | consumed samples: 21880320 | consumed tokens: 44810895360 | elapsed time per iteration (s): 0.43 | learning rate: 4.854E-05 | global batch size: 256 | lm loss: 2.255242E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.331 | TFLOPs: 31.03 | 7: iteration 85480/ 115203 | consumed samples: 21882880 | consumed tokens: 44816138240 | elapsed time per iteration (s): 0.43 | learning rate: 4.852E-05 | global batch size: 256 | lm loss: 2.238262E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.810 | TFLOPs: 31.10 | 7: iteration 85490/ 115203 | consumed samples: 21885440 | consumed tokens: 44821381120 | elapsed time per iteration (s): 0.45 | learning rate: 4.850E-05 | global batch size: 256 | lm loss: 2.245666E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.726 | TFLOPs: 30.05 | 7: iteration 85500/ 115203 | consumed samples: 21888000 | consumed tokens: 44826624000 | elapsed time per iteration (s): 0.44 | learning rate: 4.848E-05 | global batch size: 256 | lm loss: 2.234295E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.003 | TFLOPs: 30.69 | 7: iteration 85510/ 115203 | consumed samples: 21890560 | consumed tokens: 44831866880 | elapsed time per iteration (s): 0.44 | learning rate: 4.846E-05 | global batch size: 256 | lm loss: 2.252140E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.035 | TFLOPs: 30.49 | 7: iteration 85520/ 115203 | consumed samples: 21893120 | consumed tokens: 44837109760 | elapsed time per iteration (s): 0.44 | learning rate: 4.845E-05 | global batch size: 256 | lm loss: 2.272785E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.832 | TFLOPs: 30.27 | 7: iteration 85530/ 115203 | consumed samples: 21895680 | consumed tokens: 44842352640 | elapsed time per iteration (s): 0.43 | learning rate: 4.843E-05 | global batch size: 256 | lm loss: 2.200171E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.748 | TFLOPs: 30.94 | 7: iteration 85540/ 115203 | consumed samples: 21898240 | consumed tokens: 44847595520 | elapsed time per iteration (s): 0.44 | learning rate: 4.841E-05 | global batch size: 256 | lm loss: 2.258731E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.344 | TFLOPs: 30.40 | 7: iteration 85550/ 115203 | consumed samples: 21900800 | consumed tokens: 44852838400 | elapsed time per iteration (s): 0.43 | learning rate: 4.839E-05 | global batch size: 256 | lm loss: 2.270692E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.827 | TFLOPs: 31.05 | 7: iteration 85560/ 115203 | consumed samples: 21903360 | consumed tokens: 44858081280 | elapsed time per iteration (s): 0.44 | learning rate: 4.837E-05 | global batch size: 256 | lm loss: 2.257669E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.218 | TFLOPs: 30.76 | 7: iteration 85570/ 115203 | consumed samples: 21905920 | consumed tokens: 44863324160 | elapsed time per iteration (s): 0.42 | learning rate: 4.836E-05 | global batch size: 256 | lm loss: 2.251978E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.192 | TFLOPs: 31.75 | 7: iteration 85580/ 115203 | consumed samples: 21908480 | consumed tokens: 44868567040 | elapsed time per iteration (s): 0.44 | learning rate: 4.834E-05 | global batch size: 256 | lm loss: 2.272530E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.378 | TFLOPs: 30.56 | 7: iteration 85590/ 115203 | consumed samples: 21911040 | consumed tokens: 44873809920 | elapsed time per iteration (s): 0.44 | learning rate: 4.832E-05 | global batch size: 256 | lm loss: 2.256606E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.558 | TFLOPs: 30.78 | 7: iteration 85600/ 115203 | consumed samples: 21913600 | consumed tokens: 44879052800 | elapsed time per iteration (s): 0.42 | learning rate: 4.830E-05 | global batch size: 256 | lm loss: 2.293697E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.131 | TFLOPs: 31.80 | 7: iteration 85610/ 115203 | consumed samples: 21916160 | consumed tokens: 44884295680 | elapsed time per iteration (s): 0.44 | learning rate: 4.828E-05 | global batch size: 256 | lm loss: 2.264820E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.880 | TFLOPs: 30.85 | 7: iteration 85620/ 115203 | consumed samples: 21918720 | consumed tokens: 44889538560 | elapsed time per iteration (s): 0.43 | learning rate: 4.827E-05 | global batch size: 256 | lm loss: 2.234157E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.346 | TFLOPs: 31.24 | 7: iteration 85630/ 115203 | consumed samples: 21921280 | consumed tokens: 44894781440 | elapsed time per iteration (s): 0.45 | learning rate: 4.825E-05 | global batch size: 256 | lm loss: 2.291595E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.589 | TFLOPs: 29.83 | 7: iteration 85640/ 115203 | consumed samples: 21923840 | consumed tokens: 44900024320 | elapsed time per iteration (s): 0.44 | learning rate: 4.823E-05 | global batch size: 256 | lm loss: 2.266180E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.514 | TFLOPs: 30.83 | 7: iteration 85650/ 115203 | consumed samples: 21926400 | consumed tokens: 44905267200 | elapsed time per iteration (s): 0.44 | learning rate: 4.821E-05 | global batch size: 256 | lm loss: 2.224153E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.711 | TFLOPs: 30.52 | 7: iteration 85660/ 115203 | consumed samples: 21928960 | consumed tokens: 44910510080 | elapsed time per iteration (s): 0.43 | learning rate: 4.819E-05 | global batch size: 256 | lm loss: 2.242611E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.830 | TFLOPs: 31.05 | 7: iteration 85670/ 115203 | consumed samples: 21931520 | consumed tokens: 44915752960 | elapsed time per iteration (s): 0.43 | learning rate: 4.817E-05 | global batch size: 256 | lm loss: 2.228119E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.753 | TFLOPs: 31.36 | 7: iteration 85680/ 115203 | consumed samples: 21934080 | consumed tokens: 44920995840 | elapsed time per iteration (s): 0.44 | learning rate: 4.816E-05 | global batch size: 256 | lm loss: 2.240883E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.661 | TFLOPs: 30.62 | 7: iteration 85690/ 115203 | consumed samples: 21936640 | consumed tokens: 44926238720 | elapsed time per iteration (s): 0.42 | learning rate: 4.814E-05 | global batch size: 256 | lm loss: 2.268113E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.264 | TFLOPs: 31.65 | 7: iteration 85700/ 115203 | consumed samples: 21939200 | consumed tokens: 44931481600 | elapsed time per iteration (s): 0.43 | learning rate: 4.812E-05 | global batch size: 256 | lm loss: 2.194443E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.949 | TFLOPs: 30.90 | 7: iteration 85710/ 115203 | consumed samples: 21941760 | consumed tokens: 44936724480 | elapsed time per iteration (s): 0.43 | learning rate: 4.810E-05 | global batch size: 256 | lm loss: 2.199230E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.348 | TFLOPs: 31.34 | 7: iteration 85720/ 115203 | consumed samples: 21944320 | consumed tokens: 44941967360 | elapsed time per iteration (s): 0.43 | learning rate: 4.808E-05 | global batch size: 256 | lm loss: 2.240390E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.627 | TFLOPs: 31.30 | 7: iteration 85730/ 115203 | consumed samples: 21946880 | consumed tokens: 44947210240 | elapsed time per iteration (s): 0.43 | learning rate: 4.807E-05 | global batch size: 256 | lm loss: 2.251558E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.745 | TFLOPs: 31.15 | 7: iteration 85740/ 115203 | consumed samples: 21949440 | consumed tokens: 44952453120 | elapsed time per iteration (s): 0.43 | learning rate: 4.805E-05 | global batch size: 256 | lm loss: 2.249347E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.764 | TFLOPs: 31.05 | 7: iteration 85750/ 115203 | consumed samples: 21952000 | consumed tokens: 44957696000 | elapsed time per iteration (s): 0.44 | learning rate: 4.803E-05 | global batch size: 256 | lm loss: 2.251211E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.969 | TFLOPs: 30.80 | 7: iteration 85760/ 115203 | consumed samples: 21954560 | consumed tokens: 44962938880 | elapsed time per iteration (s): 0.43 | learning rate: 4.801E-05 | global batch size: 256 | lm loss: 2.242834E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.474 | TFLOPs: 31.45 | 7: iteration 85770/ 115203 | consumed samples: 21957120 | consumed tokens: 44968181760 | elapsed time per iteration (s): 0.42 | learning rate: 4.800E-05 | global batch size: 256 | lm loss: 2.226052E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.939 | TFLOPs: 31.79 | 7: iteration 85780/ 115203 | consumed samples: 21959680 | consumed tokens: 44973424640 | elapsed time per iteration (s): 0.43 | learning rate: 4.798E-05 | global batch size: 256 | lm loss: 2.260393E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.028 | TFLOPs: 31.27 | 7: iteration 85790/ 115203 | consumed samples: 21962240 | consumed tokens: 44978667520 | elapsed time per iteration (s): 0.43 | learning rate: 4.796E-05 | global batch size: 256 | lm loss: 2.232850E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.524 | TFLOPs: 31.56 | 7: iteration 85800/ 115203 | consumed samples: 21964800 | consumed tokens: 44983910400 | elapsed time per iteration (s): 0.43 | learning rate: 4.794E-05 | global batch size: 256 | lm loss: 2.211652E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.496 | TFLOPs: 31.56 | 7: iteration 85810/ 115203 | consumed samples: 21967360 | consumed tokens: 44989153280 | elapsed time per iteration (s): 0.43 | learning rate: 4.792E-05 | global batch size: 256 | lm loss: 2.277584E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.939 | TFLOPs: 30.95 | 7: iteration 85820/ 115203 | consumed samples: 21969920 | consumed tokens: 44994396160 | elapsed time per iteration (s): 0.43 | learning rate: 4.791E-05 | global batch size: 256 | lm loss: 2.248015E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.083 | TFLOPs: 31.49 | 7: iteration 85830/ 115203 | consumed samples: 21972480 | consumed tokens: 44999639040 | elapsed time per iteration (s): 0.43 | learning rate: 4.789E-05 | global batch size: 256 | lm loss: 2.241488E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.025 | TFLOPs: 31.01 | 7: iteration 85840/ 115203 | consumed samples: 21975040 | consumed tokens: 45004881920 | elapsed time per iteration (s): 0.44 | learning rate: 4.787E-05 | global batch size: 256 | lm loss: 2.261337E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.157 | TFLOPs: 30.44 | 7: iteration 85850/ 115203 | consumed samples: 21977600 | consumed tokens: 45010124800 | elapsed time per iteration (s): 0.44 | learning rate: 4.785E-05 | global batch size: 256 | lm loss: 2.255727E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.363 | TFLOPs: 30.56 | 7: iteration 85860/ 115203 | consumed samples: 21980160 | consumed tokens: 45015367680 | elapsed time per iteration (s): 0.44 | learning rate: 4.783E-05 | global batch size: 256 | lm loss: 2.286897E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.896 | TFLOPs: 30.53 | 7: iteration 85870/ 115203 | consumed samples: 21982720 | consumed tokens: 45020610560 | elapsed time per iteration (s): 0.43 | learning rate: 4.782E-05 | global batch size: 256 | lm loss: 2.269286E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.094 | TFLOPs: 31.17 | 7: iteration 85880/ 115203 | consumed samples: 21985280 | consumed tokens: 45025853440 | elapsed time per iteration (s): 0.44 | learning rate: 4.780E-05 | global batch size: 256 | lm loss: 2.225734E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.580 | TFLOPs: 30.72 | 7: iteration 85890/ 115203 | consumed samples: 21987840 | consumed tokens: 45031096320 | elapsed time per iteration (s): 0.43 | learning rate: 4.778E-05 | global batch size: 256 | lm loss: 2.225518E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.880 | TFLOPs: 31.47 | 7: iteration 85900/ 115203 | consumed samples: 21990400 | consumed tokens: 45036339200 | elapsed time per iteration (s): 0.42 | learning rate: 4.776E-05 | global batch size: 256 | lm loss: 2.250482E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.654 | TFLOPs: 31.67 | 7: iteration 85910/ 115203 | consumed samples: 21992960 | consumed tokens: 45041582080 | elapsed time per iteration (s): 0.44 | learning rate: 4.774E-05 | global batch size: 256 | lm loss: 2.223513E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.105 | TFLOPs: 30.70 | 7: iteration 85920/ 115203 | consumed samples: 21995520 | consumed tokens: 45046824960 | elapsed time per iteration (s): 0.43 | learning rate: 4.773E-05 | global batch size: 256 | lm loss: 2.257657E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.937 | TFLOPs: 31.48 | 7: iteration 85930/ 115203 | consumed samples: 21998080 | consumed tokens: 45052067840 | elapsed time per iteration (s): 0.45 | learning rate: 4.771E-05 | global batch size: 256 | lm loss: 2.244297E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.567 | TFLOPs: 29.83 | 7: iteration 85940/ 115203 | consumed samples: 22000640 | consumed tokens: 45057310720 | elapsed time per iteration (s): 0.44 | learning rate: 4.769E-05 | global batch size: 256 | lm loss: 2.225144E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.426 | TFLOPs: 30.82 | 7: iteration 85950/ 115203 | consumed samples: 22003200 | consumed tokens: 45062553600 | elapsed time per iteration (s): 0.49 | learning rate: 4.767E-05 | global batch size: 256 | lm loss: 2.240531E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 527.140 | TFLOPs: 27.66 | 7: iteration 85960/ 115203 | consumed samples: 22005760 | consumed tokens: 45067796480 | elapsed time per iteration (s): 0.43 | learning rate: 4.765E-05 | global batch size: 256 | lm loss: 2.222586E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.673 | TFLOPs: 31.52 | 7: iteration 85970/ 115203 | consumed samples: 22008320 | consumed tokens: 45073039360 | elapsed time per iteration (s): 0.43 | learning rate: 4.764E-05 | global batch size: 256 | lm loss: 2.257144E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.979 | TFLOPs: 31.01 | 7: iteration 85980/ 115203 | consumed samples: 22010880 | consumed tokens: 45078282240 | elapsed time per iteration (s): 0.43 | learning rate: 4.762E-05 | global batch size: 256 | lm loss: 2.213934E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.243 | TFLOPs: 30.97 | 7: iteration 85990/ 115203 | consumed samples: 22013440 | consumed tokens: 45083525120 | elapsed time per iteration (s): 0.43 | learning rate: 4.760E-05 | global batch size: 256 | lm loss: 2.196633E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.336 | TFLOPs: 31.08 | 0: [2022-11-28 23:19:01,516] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=0, lr=[4.7582977310170454e-05, 4.7582977310170454e-05, 4.7582977310170454e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 86000/ 115203 | consumed samples: 22016000 | consumed tokens: 45088768000 | elapsed time per iteration (s): 0.44 | learning rate: 4.758E-05 | global batch size: 256 | lm loss: 2.216845E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.085 | TFLOPs: 30.28 | 0: steps: 86000 loss: 2.2436 iter time (s): 0.435 samples/sec: 588.541 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 86000 | lm loss value: 2.193690E+00 | lm loss PPL: 8.968241E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 86000 to checkpoints_221m 0: [2022-11-28 23:19:01,738] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step86000 is begin to save! 0: [2022-11-28 23:19:01,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:19:01,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:19:01,890] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:19:01,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:19:01,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:19:01,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:19:01,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:19:01,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:19:01,960] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:19:01,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:19:01,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:19:02,006] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:19:02,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:19:02,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:19:02,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:19:02,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:19:02,053] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:19:02,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:19:02,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:19:02,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:19:02,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:19:02,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:19:02,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:19:02,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:19:02,147] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:19:02,171] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:19:02,171] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:19:02,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:19:02,194] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:19:02,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:19:02,218] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:19:02,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:19:02,240] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:19:02,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:19:02,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:19:02,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:19:02,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:19:02,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:19:02,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:19:02,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:19:02,316] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step86000/mp_rank_00_model_states.pt 0: [2022-11-28 23:19:02,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:19:02,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:19:02,337] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 7: [2022-11-28 23:19:02,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:19:02,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,407] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,409] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,409] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:19:02,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 23:19:02,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 5: [2022-11-28 23:19:02,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 3: [2022-11-28 23:19:02,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 6: [2022-11-28 23:19:02,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:19:02,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:19:02,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,425] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,425] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 4: [2022-11-28 23:19:02,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:19:02,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:19:02,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:19:02,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: [2022-11-28 23:19:02,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:19:02,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step86000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 1: [2022-11-28 23:19:02,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step86000 is ready now! 0: successfully saved checkpoint at iteration 86000 to checkpoints_221m 7: time (ms) | save-checkpoint: 947.01 7: iteration 86010/ 115203 | consumed samples: 22018560 | consumed tokens: 45094010880 | elapsed time per iteration (s): 0.54 | learning rate: 4.757E-05 | global batch size: 256 | lm loss: 2.238497E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 471.319 | TFLOPs: 24.73 | 7: iteration 86020/ 115203 | consumed samples: 22021120 | consumed tokens: 45099253760 | elapsed time per iteration (s): 0.43 | learning rate: 4.755E-05 | global batch size: 256 | lm loss: 2.211368E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.111 | TFLOPs: 31.12 | 7: iteration 86030/ 115203 | consumed samples: 22023680 | consumed tokens: 45104496640 | elapsed time per iteration (s): 0.44 | learning rate: 4.753E-05 | global batch size: 256 | lm loss: 2.223217E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.809 | TFLOPs: 30.74 | 7: iteration 86040/ 115203 | consumed samples: 22026240 | consumed tokens: 45109739520 | elapsed time per iteration (s): 0.45 | learning rate: 4.751E-05 | global batch size: 256 | lm loss: 2.229713E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.921 | TFLOPs: 30.17 | 7: iteration 86050/ 115203 | consumed samples: 22028800 | consumed tokens: 45114982400 | elapsed time per iteration (s): 0.44 | learning rate: 4.749E-05 | global batch size: 256 | lm loss: 2.231018E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.858 | TFLOPs: 30.84 | 7: iteration 86060/ 115203 | consumed samples: 22031360 | consumed tokens: 45120225280 | elapsed time per iteration (s): 0.43 | learning rate: 4.748E-05 | global batch size: 256 | lm loss: 2.225186E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.274 | TFLOPs: 31.39 | 7: iteration 86070/ 115203 | consumed samples: 22033920 | consumed tokens: 45125468160 | elapsed time per iteration (s): 0.44 | learning rate: 4.746E-05 | global batch size: 256 | lm loss: 2.257284E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.191 | TFLOPs: 30.81 | 7: iteration 86080/ 115203 | consumed samples: 22036480 | consumed tokens: 45130711040 | elapsed time per iteration (s): 0.43 | learning rate: 4.744E-05 | global batch size: 256 | lm loss: 2.202429E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.118 | TFLOPs: 31.17 | 7: iteration 86090/ 115203 | consumed samples: 22039040 | consumed tokens: 45135953920 | elapsed time per iteration (s): 0.42 | learning rate: 4.742E-05 | global batch size: 256 | lm loss: 2.212372E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.297 | TFLOPs: 31.65 | 7: iteration 86100/ 115203 | consumed samples: 22041600 | consumed tokens: 45141196800 | elapsed time per iteration (s): 0.44 | learning rate: 4.740E-05 | global batch size: 256 | lm loss: 2.251323E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.599 | TFLOPs: 30.73 | 7: iteration 86110/ 115203 | consumed samples: 22044160 | consumed tokens: 45146439680 | elapsed time per iteration (s): 0.43 | learning rate: 4.739E-05 | global batch size: 256 | lm loss: 2.245263E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.658 | TFLOPs: 31.10 | 7: iteration 86120/ 115203 | consumed samples: 22046720 | consumed tokens: 45151682560 | elapsed time per iteration (s): 0.43 | learning rate: 4.737E-05 | global batch size: 256 | lm loss: 2.221272E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.836 | TFLOPs: 31.00 | 7: iteration 86130/ 115203 | consumed samples: 22049280 | consumed tokens: 45156925440 | elapsed time per iteration (s): 0.45 | learning rate: 4.735E-05 | global batch size: 256 | lm loss: 2.254620E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.396 | TFLOPs: 30.03 | 7: iteration 86140/ 115203 | consumed samples: 22051840 | consumed tokens: 45162168320 | elapsed time per iteration (s): 0.44 | learning rate: 4.733E-05 | global batch size: 256 | lm loss: 2.262082E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.151 | TFLOPs: 30.28 | 7: iteration 86150/ 115203 | consumed samples: 22054400 | consumed tokens: 45167411200 | elapsed time per iteration (s): 0.45 | learning rate: 4.732E-05 | global batch size: 256 | lm loss: 2.218085E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.276 | TFLOPs: 30.13 | 7: iteration 86160/ 115203 | consumed samples: 22056960 | consumed tokens: 45172654080 | elapsed time per iteration (s): 0.43 | learning rate: 4.730E-05 | global batch size: 256 | lm loss: 2.238958E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.882 | TFLOPs: 31.11 | 7: iteration 86170/ 115203 | consumed samples: 22059520 | consumed tokens: 45177896960 | elapsed time per iteration (s): 0.43 | learning rate: 4.728E-05 | global batch size: 256 | lm loss: 2.228457E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.560 | TFLOPs: 30.99 | 7: iteration 86180/ 115203 | consumed samples: 22062080 | consumed tokens: 45183139840 | elapsed time per iteration (s): 0.43 | learning rate: 4.726E-05 | global batch size: 256 | lm loss: 2.247040E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.403 | TFLOPs: 31.08 | 7: iteration 86190/ 115203 | consumed samples: 22064640 | consumed tokens: 45188382720 | elapsed time per iteration (s): 0.43 | learning rate: 4.724E-05 | global batch size: 256 | lm loss: 2.240738E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.482 | TFLOPs: 31.40 | 7: iteration 86200/ 115203 | consumed samples: 22067200 | consumed tokens: 45193625600 | elapsed time per iteration (s): 0.43 | learning rate: 4.723E-05 | global batch size: 256 | lm loss: 2.240335E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.666 | TFLOPs: 31.10 | 7: iteration 86210/ 115203 | consumed samples: 22069760 | consumed tokens: 45198868480 | elapsed time per iteration (s): 0.43 | learning rate: 4.721E-05 | global batch size: 256 | lm loss: 2.241377E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.143 | TFLOPs: 31.38 | 7: iteration 86220/ 115203 | consumed samples: 22072320 | consumed tokens: 45204111360 | elapsed time per iteration (s): 0.43 | learning rate: 4.719E-05 | global batch size: 256 | lm loss: 2.226103E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.611 | TFLOPs: 31.51 | 7: iteration 86230/ 115203 | consumed samples: 22074880 | consumed tokens: 45209354240 | elapsed time per iteration (s): 0.42 | learning rate: 4.717E-05 | global batch size: 256 | lm loss: 2.216866E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.138 | TFLOPs: 31.75 | 7: iteration 86240/ 115203 | consumed samples: 22077440 | consumed tokens: 45214597120 | elapsed time per iteration (s): 0.44 | learning rate: 4.716E-05 | global batch size: 256 | lm loss: 2.217630E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.311 | TFLOPs: 30.71 | 7: iteration 86250/ 115203 | consumed samples: 22080000 | consumed tokens: 45219840000 | elapsed time per iteration (s): 0.45 | learning rate: 4.714E-05 | global batch size: 256 | lm loss: 2.262067E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.125 | TFLOPs: 30.07 | 7: iteration 86260/ 115203 | consumed samples: 22082560 | consumed tokens: 45225082880 | elapsed time per iteration (s): 0.43 | learning rate: 4.712E-05 | global batch size: 256 | lm loss: 2.215065E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.301 | TFLOPs: 31.18 | 7: iteration 86270/ 115203 | consumed samples: 22085120 | consumed tokens: 45230325760 | elapsed time per iteration (s): 0.42 | learning rate: 4.710E-05 | global batch size: 256 | lm loss: 2.252383E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.605 | TFLOPs: 31.62 | 7: iteration 86280/ 115203 | consumed samples: 22087680 | consumed tokens: 45235568640 | elapsed time per iteration (s): 0.44 | learning rate: 4.708E-05 | global batch size: 256 | lm loss: 2.263935E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.480 | TFLOPs: 30.46 | 7: iteration 86290/ 115203 | consumed samples: 22090240 | consumed tokens: 45240811520 | elapsed time per iteration (s): 0.43 | learning rate: 4.707E-05 | global batch size: 256 | lm loss: 2.275613E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.783 | TFLOPs: 31.52 | 7: iteration 86300/ 115203 | consumed samples: 22092800 | consumed tokens: 45246054400 | elapsed time per iteration (s): 0.43 | learning rate: 4.705E-05 | global batch size: 256 | lm loss: 2.205926E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.540 | TFLOPs: 31.30 | 7: iteration 86310/ 115203 | consumed samples: 22095360 | consumed tokens: 45251297280 | elapsed time per iteration (s): 0.43 | learning rate: 4.703E-05 | global batch size: 256 | lm loss: 2.249093E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.933 | TFLOPs: 31.22 | 7: iteration 86320/ 115203 | consumed samples: 22097920 | consumed tokens: 45256540160 | elapsed time per iteration (s): 0.42 | learning rate: 4.701E-05 | global batch size: 256 | lm loss: 2.223951E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.670 | TFLOPs: 31.78 | 7: iteration 86330/ 115203 | consumed samples: 22100480 | consumed tokens: 45261783040 | elapsed time per iteration (s): 0.43 | learning rate: 4.700E-05 | global batch size: 256 | lm loss: 2.250664E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.728 | TFLOPs: 31.36 | 7: iteration 86340/ 115203 | consumed samples: 22103040 | consumed tokens: 45267025920 | elapsed time per iteration (s): 0.43 | learning rate: 4.698E-05 | global batch size: 256 | lm loss: 2.239920E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.197 | TFLOPs: 31.07 | 7: iteration 86350/ 115203 | consumed samples: 22105600 | consumed tokens: 45272268800 | elapsed time per iteration (s): 0.44 | learning rate: 4.696E-05 | global batch size: 256 | lm loss: 2.222639E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.585 | TFLOPs: 30.46 | 7: iteration 86360/ 115203 | consumed samples: 22108160 | consumed tokens: 45277511680 | elapsed time per iteration (s): 0.43 | learning rate: 4.694E-05 | global batch size: 256 | lm loss: 2.235720E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.624 | TFLOPs: 31.04 | 7: iteration 86370/ 115203 | consumed samples: 22110720 | consumed tokens: 45282754560 | elapsed time per iteration (s): 0.43 | learning rate: 4.693E-05 | global batch size: 256 | lm loss: 2.239786E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.016 | TFLOPs: 31.32 | 7: iteration 86380/ 115203 | consumed samples: 22113280 | consumed tokens: 45287997440 | elapsed time per iteration (s): 0.43 | learning rate: 4.691E-05 | global batch size: 256 | lm loss: 2.229418E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.317 | TFLOPs: 31.50 | 7: iteration 86390/ 115203 | consumed samples: 22115840 | consumed tokens: 45293240320 | elapsed time per iteration (s): 0.43 | learning rate: 4.689E-05 | global batch size: 256 | lm loss: 2.223824E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.572 | TFLOPs: 31.25 | 7: iteration 86400/ 115203 | consumed samples: 22118400 | consumed tokens: 45298483200 | elapsed time per iteration (s): 0.42 | learning rate: 4.687E-05 | global batch size: 256 | lm loss: 2.267468E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.535 | TFLOPs: 31.67 | 7: iteration 86410/ 115203 | consumed samples: 22120960 | consumed tokens: 45303726080 | elapsed time per iteration (s): 0.44 | learning rate: 4.685E-05 | global batch size: 256 | lm loss: 2.260212E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.234 | TFLOPs: 30.81 | 7: iteration 86420/ 115203 | consumed samples: 22123520 | consumed tokens: 45308968960 | elapsed time per iteration (s): 0.43 | learning rate: 4.684E-05 | global batch size: 256 | lm loss: 2.236282E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.008 | TFLOPs: 31.27 | 7: iteration 86430/ 115203 | consumed samples: 22126080 | consumed tokens: 45314211840 | elapsed time per iteration (s): 0.43 | learning rate: 4.682E-05 | global batch size: 256 | lm loss: 2.252378E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.875 | TFLOPs: 31.26 | 7: iteration 86440/ 115203 | consumed samples: 22128640 | consumed tokens: 45319454720 | elapsed time per iteration (s): 0.44 | learning rate: 4.680E-05 | global batch size: 256 | lm loss: 2.261874E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.154 | TFLOPs: 30.70 | 7: iteration 86450/ 115203 | consumed samples: 22131200 | consumed tokens: 45324697600 | elapsed time per iteration (s): 0.44 | learning rate: 4.678E-05 | global batch size: 256 | lm loss: 2.191366E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.212 | TFLOPs: 30.65 | 7: iteration 86460/ 115203 | consumed samples: 22133760 | consumed tokens: 45329940480 | elapsed time per iteration (s): 0.43 | learning rate: 4.677E-05 | global batch size: 256 | lm loss: 2.215539E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.101 | TFLOPs: 31.07 | 7: iteration 86470/ 115203 | consumed samples: 22136320 | consumed tokens: 45335183360 | elapsed time per iteration (s): 0.43 | learning rate: 4.675E-05 | global batch size: 256 | lm loss: 2.219009E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.175 | TFLOPs: 31.60 | 7: iteration 86480/ 115203 | consumed samples: 22138880 | consumed tokens: 45340426240 | elapsed time per iteration (s): 0.44 | learning rate: 4.673E-05 | global batch size: 256 | lm loss: 2.258850E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.368 | TFLOPs: 30.19 | 7: iteration 86490/ 115203 | consumed samples: 22141440 | consumed tokens: 45345669120 | elapsed time per iteration (s): 0.44 | learning rate: 4.671E-05 | global batch size: 256 | lm loss: 2.240348E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.244 | TFLOPs: 30.29 | 7: iteration 86500/ 115203 | consumed samples: 22144000 | consumed tokens: 45350912000 | elapsed time per iteration (s): 0.43 | learning rate: 4.670E-05 | global batch size: 256 | lm loss: 2.245573E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.169 | TFLOPs: 31.23 | 7: iteration 86510/ 115203 | consumed samples: 22146560 | consumed tokens: 45356154880 | elapsed time per iteration (s): 0.42 | learning rate: 4.668E-05 | global batch size: 256 | lm loss: 2.231462E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.406 | TFLOPs: 31.82 | 7: iteration 86520/ 115203 | consumed samples: 22149120 | consumed tokens: 45361397760 | elapsed time per iteration (s): 0.43 | learning rate: 4.666E-05 | global batch size: 256 | lm loss: 2.248477E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.127 | TFLOPs: 31.12 | 7: iteration 86530/ 115203 | consumed samples: 22151680 | consumed tokens: 45366640640 | elapsed time per iteration (s): 0.44 | learning rate: 4.664E-05 | global batch size: 256 | lm loss: 2.237205E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.429 | TFLOPs: 30.66 | 7: iteration 86540/ 115203 | consumed samples: 22154240 | consumed tokens: 45371883520 | elapsed time per iteration (s): 0.42 | learning rate: 4.663E-05 | global batch size: 256 | lm loss: 2.231954E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.188 | TFLOPs: 31.81 | 7: iteration 86550/ 115203 | consumed samples: 22156800 | consumed tokens: 45377126400 | elapsed time per iteration (s): 0.43 | learning rate: 4.661E-05 | global batch size: 256 | lm loss: 2.247227E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.576 | TFLOPs: 30.93 | 7: iteration 86560/ 115203 | consumed samples: 22159360 | consumed tokens: 45382369280 | elapsed time per iteration (s): 0.43 | learning rate: 4.659E-05 | global batch size: 256 | lm loss: 2.244336E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.776 | TFLOPs: 31.00 | 7: iteration 86570/ 115203 | consumed samples: 22161920 | consumed tokens: 45387612160 | elapsed time per iteration (s): 0.43 | learning rate: 4.657E-05 | global batch size: 256 | lm loss: 2.240782E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.695 | TFLOPs: 31.15 | 7: iteration 86580/ 115203 | consumed samples: 22164480 | consumed tokens: 45392855040 | elapsed time per iteration (s): 0.43 | learning rate: 4.656E-05 | global batch size: 256 | lm loss: 2.217003E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.035 | TFLOPs: 31.01 | 7: iteration 86590/ 115203 | consumed samples: 22167040 | consumed tokens: 45398097920 | elapsed time per iteration (s): 0.43 | learning rate: 4.654E-05 | global batch size: 256 | lm loss: 2.235435E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.957 | TFLOPs: 31.16 | 7: iteration 86600/ 115203 | consumed samples: 22169600 | consumed tokens: 45403340800 | elapsed time per iteration (s): 0.43 | learning rate: 4.652E-05 | global batch size: 256 | lm loss: 2.263311E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.690 | TFLOPs: 31.15 | 7: iteration 86610/ 115203 | consumed samples: 22172160 | consumed tokens: 45408583680 | elapsed time per iteration (s): 0.43 | learning rate: 4.650E-05 | global batch size: 256 | lm loss: 2.228284E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.514 | TFLOPs: 30.88 | 7: iteration 86620/ 115203 | consumed samples: 22174720 | consumed tokens: 45413826560 | elapsed time per iteration (s): 0.43 | learning rate: 4.648E-05 | global batch size: 256 | lm loss: 2.263149E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.640 | TFLOPs: 30.94 | 7: iteration 86630/ 115203 | consumed samples: 22177280 | consumed tokens: 45419069440 | elapsed time per iteration (s): 0.44 | learning rate: 4.647E-05 | global batch size: 256 | lm loss: 2.235235E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.814 | TFLOPs: 30.32 | 7: iteration 86640/ 115203 | consumed samples: 22179840 | consumed tokens: 45424312320 | elapsed time per iteration (s): 0.43 | learning rate: 4.645E-05 | global batch size: 256 | lm loss: 2.224957E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.415 | TFLOPs: 31.35 | 7: iteration 86650/ 115203 | consumed samples: 22182400 | consumed tokens: 45429555200 | elapsed time per iteration (s): 0.43 | learning rate: 4.643E-05 | global batch size: 256 | lm loss: 2.238429E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.218 | TFLOPs: 31.07 | 7: iteration 86660/ 115203 | consumed samples: 22184960 | consumed tokens: 45434798080 | elapsed time per iteration (s): 0.43 | learning rate: 4.641E-05 | global batch size: 256 | lm loss: 2.250080E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.622 | TFLOPs: 31.57 | 7: iteration 86670/ 115203 | consumed samples: 22187520 | consumed tokens: 45440040960 | elapsed time per iteration (s): 0.45 | learning rate: 4.640E-05 | global batch size: 256 | lm loss: 2.240279E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.011 | TFLOPs: 30.12 | 7: iteration 86680/ 115203 | consumed samples: 22190080 | consumed tokens: 45445283840 | elapsed time per iteration (s): 0.43 | learning rate: 4.638E-05 | global batch size: 256 | lm loss: 2.207066E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.121 | TFLOPs: 31.49 | 7: iteration 86690/ 115203 | consumed samples: 22192640 | consumed tokens: 45450526720 | elapsed time per iteration (s): 0.43 | learning rate: 4.636E-05 | global batch size: 256 | lm loss: 2.253386E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.965 | TFLOPs: 31.06 | 7: iteration 86700/ 115203 | consumed samples: 22195200 | consumed tokens: 45455769600 | elapsed time per iteration (s): 0.42 | learning rate: 4.634E-05 | global batch size: 256 | lm loss: 2.265790E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.533 | TFLOPs: 31.61 | 7: iteration 86710/ 115203 | consumed samples: 22197760 | consumed tokens: 45461012480 | elapsed time per iteration (s): 0.45 | learning rate: 4.633E-05 | global batch size: 256 | lm loss: 2.260040E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.862 | TFLOPs: 29.79 | 7: iteration 86720/ 115203 | consumed samples: 22200320 | consumed tokens: 45466255360 | elapsed time per iteration (s): 0.44 | learning rate: 4.631E-05 | global batch size: 256 | lm loss: 2.247569E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.437 | TFLOPs: 30.66 | 7: iteration 86730/ 115203 | consumed samples: 22202880 | consumed tokens: 45471498240 | elapsed time per iteration (s): 0.43 | learning rate: 4.629E-05 | global batch size: 256 | lm loss: 2.238408E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.295 | TFLOPs: 31.13 | 7: iteration 86740/ 115203 | consumed samples: 22205440 | consumed tokens: 45476741120 | elapsed time per iteration (s): 0.42 | learning rate: 4.627E-05 | global batch size: 256 | lm loss: 2.247248E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.094 | TFLOPs: 31.75 | 7: iteration 86750/ 115203 | consumed samples: 22208000 | consumed tokens: 45481984000 | elapsed time per iteration (s): 0.43 | learning rate: 4.626E-05 | global batch size: 256 | lm loss: 2.202657E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.620 | TFLOPs: 31.25 | 7: iteration 86760/ 115203 | consumed samples: 22210560 | consumed tokens: 45487226880 | elapsed time per iteration (s): 0.42 | learning rate: 4.624E-05 | global batch size: 256 | lm loss: 2.242580E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.405 | TFLOPs: 31.92 | 7: iteration 86770/ 115203 | consumed samples: 22213120 | consumed tokens: 45492469760 | elapsed time per iteration (s): 0.42 | learning rate: 4.622E-05 | global batch size: 256 | lm loss: 2.231825E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.772 | TFLOPs: 31.63 | 7: iteration 86780/ 115203 | consumed samples: 22215680 | consumed tokens: 45497712640 | elapsed time per iteration (s): 0.45 | learning rate: 4.620E-05 | global batch size: 256 | lm loss: 2.275491E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.033 | TFLOPs: 30.07 | 7: iteration 86790/ 115203 | consumed samples: 22218240 | consumed tokens: 45502955520 | elapsed time per iteration (s): 0.43 | learning rate: 4.619E-05 | global batch size: 256 | lm loss: 2.247566E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.451 | TFLOPs: 31.45 | 7: iteration 86800/ 115203 | consumed samples: 22220800 | consumed tokens: 45508198400 | elapsed time per iteration (s): 0.43 | learning rate: 4.617E-05 | global batch size: 256 | lm loss: 2.212045E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.557 | TFLOPs: 31.35 | 7: iteration 86810/ 115203 | consumed samples: 22223360 | consumed tokens: 45513441280 | elapsed time per iteration (s): 0.43 | learning rate: 4.615E-05 | global batch size: 256 | lm loss: 2.252675E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.422 | TFLOPs: 31.40 | 7: iteration 86820/ 115203 | consumed samples: 22225920 | consumed tokens: 45518684160 | elapsed time per iteration (s): 0.43 | learning rate: 4.613E-05 | global batch size: 256 | lm loss: 2.244408E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.031 | TFLOPs: 31.27 | 7: iteration 86830/ 115203 | consumed samples: 22228480 | consumed tokens: 45523927040 | elapsed time per iteration (s): 0.42 | learning rate: 4.612E-05 | global batch size: 256 | lm loss: 2.239272E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.371 | TFLOPs: 31.97 | 7: iteration 86840/ 115203 | consumed samples: 22231040 | consumed tokens: 45529169920 | elapsed time per iteration (s): 0.43 | learning rate: 4.610E-05 | global batch size: 256 | lm loss: 2.210885E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.521 | TFLOPs: 31.09 | 7: iteration 86850/ 115203 | consumed samples: 22233600 | consumed tokens: 45534412800 | elapsed time per iteration (s): 0.43 | learning rate: 4.608E-05 | global batch size: 256 | lm loss: 2.242933E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.120 | TFLOPs: 31.59 | 7: iteration 86860/ 115203 | consumed samples: 22236160 | consumed tokens: 45539655680 | elapsed time per iteration (s): 0.43 | learning rate: 4.606E-05 | global batch size: 256 | lm loss: 2.235767E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.216 | TFLOPs: 31.23 | 7: iteration 86870/ 115203 | consumed samples: 22238720 | consumed tokens: 45544898560 | elapsed time per iteration (s): 0.44 | learning rate: 4.605E-05 | global batch size: 256 | lm loss: 2.240697E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.020 | TFLOPs: 30.75 | 7: iteration 86880/ 115203 | consumed samples: 22241280 | consumed tokens: 45550141440 | elapsed time per iteration (s): 0.43 | learning rate: 4.603E-05 | global batch size: 256 | lm loss: 2.208245E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.424 | TFLOPs: 31.03 | 7: iteration 86890/ 115203 | consumed samples: 22243840 | consumed tokens: 45555384320 | elapsed time per iteration (s): 0.43 | learning rate: 4.601E-05 | global batch size: 256 | lm loss: 2.236831E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.119 | TFLOPs: 31.33 | 7: iteration 86900/ 115203 | consumed samples: 22246400 | consumed tokens: 45560627200 | elapsed time per iteration (s): 0.43 | learning rate: 4.599E-05 | global batch size: 256 | lm loss: 2.239270E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.755 | TFLOPs: 31.31 | 7: iteration 86910/ 115203 | consumed samples: 22248960 | consumed tokens: 45565870080 | elapsed time per iteration (s): 0.42 | learning rate: 4.598E-05 | global batch size: 256 | lm loss: 2.248493E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 612.848 | TFLOPs: 32.16 | 7: iteration 86920/ 115203 | consumed samples: 22251520 | consumed tokens: 45571112960 | elapsed time per iteration (s): 0.43 | learning rate: 4.596E-05 | global batch size: 256 | lm loss: 2.238053E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.006 | TFLOPs: 31.43 | 7: iteration 86930/ 115203 | consumed samples: 22254080 | consumed tokens: 45576355840 | elapsed time per iteration (s): 0.46 | learning rate: 4.594E-05 | global batch size: 256 | lm loss: 2.231633E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.563 | TFLOPs: 29.36 | 7: iteration 86940/ 115203 | consumed samples: 22256640 | consumed tokens: 45581598720 | elapsed time per iteration (s): 0.43 | learning rate: 4.593E-05 | global batch size: 256 | lm loss: 2.222629E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.436 | TFLOPs: 31.50 | 7: iteration 86950/ 115203 | consumed samples: 22259200 | consumed tokens: 45586841600 | elapsed time per iteration (s): 0.43 | learning rate: 4.591E-05 | global batch size: 256 | lm loss: 2.226697E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.318 | TFLOPs: 31.39 | 7: iteration 86960/ 115203 | consumed samples: 22261760 | consumed tokens: 45592084480 | elapsed time per iteration (s): 0.43 | learning rate: 4.589E-05 | global batch size: 256 | lm loss: 2.213659E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.919 | TFLOPs: 31.21 | 7: iteration 86970/ 115203 | consumed samples: 22264320 | consumed tokens: 45597327360 | elapsed time per iteration (s): 0.44 | learning rate: 4.587E-05 | global batch size: 256 | lm loss: 2.261825E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.944 | TFLOPs: 30.64 | 7: iteration 86980/ 115203 | consumed samples: 22266880 | consumed tokens: 45602570240 | elapsed time per iteration (s): 0.44 | learning rate: 4.586E-05 | global batch size: 256 | lm loss: 2.249661E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.360 | TFLOPs: 30.77 | 7: iteration 86990/ 115203 | consumed samples: 22269440 | consumed tokens: 45607813120 | elapsed time per iteration (s): 0.43 | learning rate: 4.584E-05 | global batch size: 256 | lm loss: 2.239700E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.069 | TFLOPs: 31.33 | 7: iteration 87000/ 115203 | consumed samples: 22272000 | consumed tokens: 45613056000 | elapsed time per iteration (s): 0.43 | learning rate: 4.582E-05 | global batch size: 256 | lm loss: 2.234410E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.332 | TFLOPs: 31.29 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 87000 | lm loss value: 2.149493E+00 | lm loss PPL: 8.580507E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 87000 to checkpoints_221m 0: [2022-11-28 23:26:15,022] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step87000 is begin to save! 0: [2022-11-28 23:26:15,027] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:26:15,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:26:15,136] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:26:15,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:26:15,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:26:15,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:26:15,183] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:26:15,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:26:15,206] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:26:15,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:26:15,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:26:15,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:26:15,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:26:15,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:26:15,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:26:15,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:26:15,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:26:15,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:26:15,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:26:15,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:26:15,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:26:15,367] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:26:15,367] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:26:15,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:26:15,391] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:26:15,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:26:15,414] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:26:15,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:26:15,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:26:15,460] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:26:15,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:26:15,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:26:15,484] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:26:15,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:26:15,508] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:26:15,531] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:26:15,531] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:26:15,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:26:15,555] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:26:15,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:26:15,560] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step87000/mp_rank_00_model_states.pt 0: [2022-11-28 23:26:15,560] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:26:15,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:26:15,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,626] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,626] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,628] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,628] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,632] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,632] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,632] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,633] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,633] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,633] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,637] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,637] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,638] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,638] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 2: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:26:15,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:26:15,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,640] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 4: [2022-11-28 23:26:15,640] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,641] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,641] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,641] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,645] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,631] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,631] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,631] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,634] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,634] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,635] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,639] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,635] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,639] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,636] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,642] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,636] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,642] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,643] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,643] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 5: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:26:15,645] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 23:26:15,644] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 23:26:15,645] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 3: [2022-11-28 23:26:15,644] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,648] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,648] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:26:15,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 6: [2022-11-28 23:26:15,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,656] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:26:15,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,656] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:26:15,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 1: [2022-11-28 23:26:15,656] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:26:15,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,657] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:26:15,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 7: [2022-11-28 23:26:15,657] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,660] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,660] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,661] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:26:15,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: [2022-11-28 23:26:15,700] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step87000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:26:15,700] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step87000 is ready now! 0: successfully saved checkpoint at iteration 87000 to checkpoints_221m 7: time (ms) | save-checkpoint: 682.75 7: iteration 87010/ 115203 | consumed samples: 22274560 | consumed tokens: 45618298880 | elapsed time per iteration (s): 0.51 | learning rate: 4.580E-05 | global batch size: 256 | lm loss: 2.223834E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.626 | TFLOPs: 26.21 | 7: iteration 87020/ 115203 | consumed samples: 22277120 | consumed tokens: 45623541760 | elapsed time per iteration (s): 0.44 | learning rate: 4.579E-05 | global batch size: 256 | lm loss: 2.237997E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.872 | TFLOPs: 30.27 | 7: iteration 87030/ 115203 | consumed samples: 22279680 | consumed tokens: 45628784640 | elapsed time per iteration (s): 0.43 | learning rate: 4.577E-05 | global batch size: 256 | lm loss: 2.252358E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.986 | TFLOPs: 31.48 | 7: iteration 87040/ 115203 | consumed samples: 22282240 | consumed tokens: 45634027520 | elapsed time per iteration (s): 0.44 | learning rate: 4.575E-05 | global batch size: 256 | lm loss: 2.243201E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.373 | TFLOPs: 30.29 | 7: iteration 87050/ 115203 | consumed samples: 22284800 | consumed tokens: 45639270400 | elapsed time per iteration (s): 0.44 | learning rate: 4.573E-05 | global batch size: 256 | lm loss: 2.280880E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.523 | TFLOPs: 30.41 | 7: iteration 87060/ 115203 | consumed samples: 22287360 | consumed tokens: 45644513280 | elapsed time per iteration (s): 0.42 | learning rate: 4.572E-05 | global batch size: 256 | lm loss: 2.225381E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.954 | TFLOPs: 31.64 | 7: iteration 87070/ 115203 | consumed samples: 22289920 | consumed tokens: 45649756160 | elapsed time per iteration (s): 0.43 | learning rate: 4.570E-05 | global batch size: 256 | lm loss: 2.219424E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.285 | TFLOPs: 31.08 | 7: iteration 87080/ 115203 | consumed samples: 22292480 | consumed tokens: 45654999040 | elapsed time per iteration (s): 0.43 | learning rate: 4.568E-05 | global batch size: 256 | lm loss: 2.261370E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.295 | TFLOPs: 31.18 | 7: iteration 87090/ 115203 | consumed samples: 22295040 | consumed tokens: 45660241920 | elapsed time per iteration (s): 0.43 | learning rate: 4.566E-05 | global batch size: 256 | lm loss: 2.214978E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.538 | TFLOPs: 30.88 | 7: iteration 87100/ 115203 | consumed samples: 22297600 | consumed tokens: 45665484800 | elapsed time per iteration (s): 0.44 | learning rate: 4.565E-05 | global batch size: 256 | lm loss: 2.251999E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.704 | TFLOPs: 30.63 | 7: iteration 87110/ 115203 | consumed samples: 22300160 | consumed tokens: 45670727680 | elapsed time per iteration (s): 0.43 | learning rate: 4.563E-05 | global batch size: 256 | lm loss: 2.222675E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.883 | TFLOPs: 30.95 | 7: iteration 87120/ 115203 | consumed samples: 22302720 | consumed tokens: 45675970560 | elapsed time per iteration (s): 0.42 | learning rate: 4.561E-05 | global batch size: 256 | lm loss: 2.243549E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.885 | TFLOPs: 32.00 | 7: iteration 87130/ 115203 | consumed samples: 22305280 | consumed tokens: 45681213440 | elapsed time per iteration (s): 0.42 | learning rate: 4.560E-05 | global batch size: 256 | lm loss: 2.224931E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.593 | TFLOPs: 32.04 | 7: iteration 87140/ 115203 | consumed samples: 22307840 | consumed tokens: 45686456320 | elapsed time per iteration (s): 0.43 | learning rate: 4.558E-05 | global batch size: 256 | lm loss: 2.232733E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.513 | TFLOPs: 31.19 | 7: iteration 87150/ 115203 | consumed samples: 22310400 | consumed tokens: 45691699200 | elapsed time per iteration (s): 0.43 | learning rate: 4.556E-05 | global batch size: 256 | lm loss: 2.226462E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.626 | TFLOPs: 31.46 | 7: iteration 87160/ 115203 | consumed samples: 22312960 | consumed tokens: 45696942080 | elapsed time per iteration (s): 0.43 | learning rate: 4.554E-05 | global batch size: 256 | lm loss: 2.231937E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.506 | TFLOPs: 31.46 | 7: iteration 87170/ 115203 | consumed samples: 22315520 | consumed tokens: 45702184960 | elapsed time per iteration (s): 0.43 | learning rate: 4.553E-05 | global batch size: 256 | lm loss: 2.232439E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.219 | TFLOPs: 31.39 | 7: iteration 87180/ 115203 | consumed samples: 22318080 | consumed tokens: 45707427840 | elapsed time per iteration (s): 0.44 | learning rate: 4.551E-05 | global batch size: 256 | lm loss: 2.243031E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.037 | TFLOPs: 30.70 | 7: iteration 87190/ 115203 | consumed samples: 22320640 | consumed tokens: 45712670720 | elapsed time per iteration (s): 0.42 | learning rate: 4.549E-05 | global batch size: 256 | lm loss: 2.252126E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.991 | TFLOPs: 31.80 | 7: iteration 87200/ 115203 | consumed samples: 22323200 | consumed tokens: 45717913600 | elapsed time per iteration (s): 0.43 | learning rate: 4.547E-05 | global batch size: 256 | lm loss: 2.214961E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.400 | TFLOPs: 31.55 | 7: iteration 87210/ 115203 | consumed samples: 22325760 | consumed tokens: 45723156480 | elapsed time per iteration (s): 0.42 | learning rate: 4.546E-05 | global batch size: 256 | lm loss: 2.240898E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.872 | TFLOPs: 31.84 | 7: iteration 87220/ 115203 | consumed samples: 22328320 | consumed tokens: 45728399360 | elapsed time per iteration (s): 0.42 | learning rate: 4.544E-05 | global batch size: 256 | lm loss: 2.252660E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.683 | TFLOPs: 31.62 | 7: iteration 87230/ 115203 | consumed samples: 22330880 | consumed tokens: 45733642240 | elapsed time per iteration (s): 0.43 | learning rate: 4.542E-05 | global batch size: 256 | lm loss: 2.264085E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.754 | TFLOPs: 31.05 | 7: iteration 87240/ 115203 | consumed samples: 22333440 | consumed tokens: 45738885120 | elapsed time per iteration (s): 0.43 | learning rate: 4.541E-05 | global batch size: 256 | lm loss: 2.216021E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.019 | TFLOPs: 31.22 | 7: iteration 87250/ 115203 | consumed samples: 22336000 | consumed tokens: 45744128000 | elapsed time per iteration (s): 0.43 | learning rate: 4.539E-05 | global batch size: 256 | lm loss: 2.239597E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.510 | TFLOPs: 31.14 | 7: iteration 87260/ 115203 | consumed samples: 22338560 | consumed tokens: 45749370880 | elapsed time per iteration (s): 0.42 | learning rate: 4.537E-05 | global batch size: 256 | lm loss: 2.245065E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.823 | TFLOPs: 31.73 | 7: iteration 87270/ 115203 | consumed samples: 22341120 | consumed tokens: 45754613760 | elapsed time per iteration (s): 0.42 | learning rate: 4.535E-05 | global batch size: 256 | lm loss: 2.234553E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.400 | TFLOPs: 31.92 | 7: iteration 87280/ 115203 | consumed samples: 22343680 | consumed tokens: 45759856640 | elapsed time per iteration (s): 0.43 | learning rate: 4.534E-05 | global batch size: 256 | lm loss: 2.250629E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.571 | TFLOPs: 30.93 | 7: iteration 87290/ 115203 | consumed samples: 22346240 | consumed tokens: 45765099520 | elapsed time per iteration (s): 0.43 | learning rate: 4.532E-05 | global batch size: 256 | lm loss: 2.208942E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.004 | TFLOPs: 31.01 | 7: iteration 87300/ 115203 | consumed samples: 22348800 | consumed tokens: 45770342400 | elapsed time per iteration (s): 0.42 | learning rate: 4.530E-05 | global batch size: 256 | lm loss: 2.231701E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.865 | TFLOPs: 31.68 | 7: iteration 87310/ 115203 | consumed samples: 22351360 | consumed tokens: 45775585280 | elapsed time per iteration (s): 0.43 | learning rate: 4.528E-05 | global batch size: 256 | lm loss: 2.233232E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.871 | TFLOPs: 31.00 | 7: iteration 87320/ 115203 | consumed samples: 22353920 | consumed tokens: 45780828160 | elapsed time per iteration (s): 0.44 | learning rate: 4.527E-05 | global batch size: 256 | lm loss: 2.258652E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.537 | TFLOPs: 30.25 | 7: iteration 87330/ 115203 | consumed samples: 22356480 | consumed tokens: 45786071040 | elapsed time per iteration (s): 0.43 | learning rate: 4.525E-05 | global batch size: 256 | lm loss: 2.216037E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.439 | TFLOPs: 31.03 | 7: iteration 87340/ 115203 | consumed samples: 22359040 | consumed tokens: 45791313920 | elapsed time per iteration (s): 0.44 | learning rate: 4.523E-05 | global batch size: 256 | lm loss: 2.208923E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.224 | TFLOPs: 30.86 | 7: iteration 87350/ 115203 | consumed samples: 22361600 | consumed tokens: 45796556800 | elapsed time per iteration (s): 0.44 | learning rate: 4.522E-05 | global batch size: 256 | lm loss: 2.251528E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.087 | TFLOPs: 30.33 | 7: iteration 87360/ 115203 | consumed samples: 22364160 | consumed tokens: 45801799680 | elapsed time per iteration (s): 0.43 | learning rate: 4.520E-05 | global batch size: 256 | lm loss: 2.228546E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.108 | TFLOPs: 31.28 | 7: iteration 87370/ 115203 | consumed samples: 22366720 | consumed tokens: 45807042560 | elapsed time per iteration (s): 0.43 | learning rate: 4.518E-05 | global batch size: 256 | lm loss: 2.238311E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.068 | TFLOPs: 31.17 | 7: iteration 87380/ 115203 | consumed samples: 22369280 | consumed tokens: 45812285440 | elapsed time per iteration (s): 0.42 | learning rate: 4.516E-05 | global batch size: 256 | lm loss: 2.206663E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.061 | TFLOPs: 31.69 | 7: iteration 87390/ 115203 | consumed samples: 22371840 | consumed tokens: 45817528320 | elapsed time per iteration (s): 0.43 | learning rate: 4.515E-05 | global batch size: 256 | lm loss: 2.214932E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.168 | TFLOPs: 31.54 | 7: iteration 87400/ 115203 | consumed samples: 22374400 | consumed tokens: 45822771200 | elapsed time per iteration (s): 0.45 | learning rate: 4.513E-05 | global batch size: 256 | lm loss: 2.205347E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.868 | TFLOPs: 29.95 | 7: iteration 87410/ 115203 | consumed samples: 22376960 | consumed tokens: 45828014080 | elapsed time per iteration (s): 0.43 | learning rate: 4.511E-05 | global batch size: 256 | lm loss: 2.261602E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.677 | TFLOPs: 30.94 | 7: iteration 87420/ 115203 | consumed samples: 22379520 | consumed tokens: 45833256960 | elapsed time per iteration (s): 0.43 | learning rate: 4.510E-05 | global batch size: 256 | lm loss: 2.257068E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.696 | TFLOPs: 31.10 | 7: iteration 87430/ 115203 | consumed samples: 22382080 | consumed tokens: 45838499840 | elapsed time per iteration (s): 0.43 | learning rate: 4.508E-05 | global batch size: 256 | lm loss: 2.241191E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.107 | TFLOPs: 31.59 | 7: iteration 87440/ 115203 | consumed samples: 22384640 | consumed tokens: 45843742720 | elapsed time per iteration (s): 0.43 | learning rate: 4.506E-05 | global batch size: 256 | lm loss: 2.219680E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.513 | TFLOPs: 31.30 | 7: iteration 87450/ 115203 | consumed samples: 22387200 | consumed tokens: 45848985600 | elapsed time per iteration (s): 0.43 | learning rate: 4.504E-05 | global batch size: 256 | lm loss: 2.228922E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.011 | TFLOPs: 31.48 | 7: iteration 87460/ 115203 | consumed samples: 22389760 | consumed tokens: 45854228480 | elapsed time per iteration (s): 0.43 | learning rate: 4.503E-05 | global batch size: 256 | lm loss: 2.241566E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.790 | TFLOPs: 31.37 | 7: iteration 87470/ 115203 | consumed samples: 22392320 | consumed tokens: 45859471360 | elapsed time per iteration (s): 0.42 | learning rate: 4.501E-05 | global batch size: 256 | lm loss: 2.243688E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.335 | TFLOPs: 31.87 | 7: iteration 87480/ 115203 | consumed samples: 22394880 | consumed tokens: 45864714240 | elapsed time per iteration (s): 0.45 | learning rate: 4.499E-05 | global batch size: 256 | lm loss: 2.221438E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.840 | TFLOPs: 30.06 | 7: iteration 87490/ 115203 | consumed samples: 22397440 | consumed tokens: 45869957120 | elapsed time per iteration (s): 0.43 | learning rate: 4.498E-05 | global batch size: 256 | lm loss: 2.273880E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.981 | TFLOPs: 31.48 | 7: iteration 87500/ 115203 | consumed samples: 22400000 | consumed tokens: 45875200000 | elapsed time per iteration (s): 0.43 | learning rate: 4.496E-05 | global batch size: 256 | lm loss: 2.276000E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.862 | TFLOPs: 31.00 | 7: iteration 87510/ 115203 | consumed samples: 22402560 | consumed tokens: 45880442880 | elapsed time per iteration (s): 0.42 | learning rate: 4.494E-05 | global batch size: 256 | lm loss: 2.253602E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.121 | TFLOPs: 31.70 | 7: iteration 87520/ 115203 | consumed samples: 22405120 | consumed tokens: 45885685760 | elapsed time per iteration (s): 0.44 | learning rate: 4.492E-05 | global batch size: 256 | lm loss: 2.232256E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.252 | TFLOPs: 30.81 | 7: iteration 87530/ 115203 | consumed samples: 22407680 | consumed tokens: 45890928640 | elapsed time per iteration (s): 0.43 | learning rate: 4.491E-05 | global batch size: 256 | lm loss: 2.250467E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.816 | TFLOPs: 31.42 | 7: iteration 87540/ 115203 | consumed samples: 22410240 | consumed tokens: 45896171520 | elapsed time per iteration (s): 0.45 | learning rate: 4.489E-05 | global batch size: 256 | lm loss: 2.269971E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.309 | TFLOPs: 29.71 | 7: iteration 87550/ 115203 | consumed samples: 22412800 | consumed tokens: 45901414400 | elapsed time per iteration (s): 0.43 | learning rate: 4.487E-05 | global batch size: 256 | lm loss: 2.222912E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.147 | TFLOPs: 31.28 | 7: iteration 87560/ 115203 | consumed samples: 22415360 | consumed tokens: 45906657280 | elapsed time per iteration (s): 0.43 | learning rate: 4.486E-05 | global batch size: 256 | lm loss: 2.224960E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.121 | TFLOPs: 31.33 | 7: iteration 87570/ 115203 | consumed samples: 22417920 | consumed tokens: 45911900160 | elapsed time per iteration (s): 0.42 | learning rate: 4.484E-05 | global batch size: 256 | lm loss: 2.261652E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.457 | TFLOPs: 31.98 | 7: iteration 87580/ 115203 | consumed samples: 22420480 | consumed tokens: 45917143040 | elapsed time per iteration (s): 0.43 | learning rate: 4.482E-05 | global batch size: 256 | lm loss: 2.226502E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.852 | TFLOPs: 31.32 | 7: iteration 87590/ 115203 | consumed samples: 22423040 | consumed tokens: 45922385920 | elapsed time per iteration (s): 0.42 | learning rate: 4.480E-05 | global batch size: 256 | lm loss: 2.247850E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.244 | TFLOPs: 31.76 | 7: iteration 87600/ 115203 | consumed samples: 22425600 | consumed tokens: 45927628800 | elapsed time per iteration (s): 0.43 | learning rate: 4.479E-05 | global batch size: 256 | lm loss: 2.257957E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.228 | TFLOPs: 31.49 | 7: iteration 87610/ 115203 | consumed samples: 22428160 | consumed tokens: 45932871680 | elapsed time per iteration (s): 0.42 | learning rate: 4.477E-05 | global batch size: 256 | lm loss: 2.240215E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.564 | TFLOPs: 31.67 | 7: iteration 87620/ 115203 | consumed samples: 22430720 | consumed tokens: 45938114560 | elapsed time per iteration (s): 0.43 | learning rate: 4.475E-05 | global batch size: 256 | lm loss: 2.211955E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.509 | TFLOPs: 31.14 | 7: iteration 87630/ 115203 | consumed samples: 22433280 | consumed tokens: 45943357440 | elapsed time per iteration (s): 0.43 | learning rate: 4.474E-05 | global batch size: 256 | lm loss: 2.247297E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.877 | TFLOPs: 31.21 | 7: iteration 87640/ 115203 | consumed samples: 22435840 | consumed tokens: 45948600320 | elapsed time per iteration (s): 0.43 | learning rate: 4.472E-05 | global batch size: 256 | lm loss: 2.257985E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.659 | TFLOPs: 31.04 | 7: iteration 87650/ 115203 | consumed samples: 22438400 | consumed tokens: 45953843200 | elapsed time per iteration (s): 0.43 | learning rate: 4.470E-05 | global batch size: 256 | lm loss: 2.255984E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.449 | TFLOPs: 31.50 | 7: iteration 87660/ 115203 | consumed samples: 22440960 | consumed tokens: 45959086080 | elapsed time per iteration (s): 0.43 | learning rate: 4.468E-05 | global batch size: 256 | lm loss: 2.233777E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.479 | TFLOPs: 31.35 | 7: iteration 87670/ 115203 | consumed samples: 22443520 | consumed tokens: 45964328960 | elapsed time per iteration (s): 0.43 | learning rate: 4.467E-05 | global batch size: 256 | lm loss: 2.257289E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.644 | TFLOPs: 31.46 | 7: iteration 87680/ 115203 | consumed samples: 22446080 | consumed tokens: 45969571840 | elapsed time per iteration (s): 0.44 | learning rate: 4.465E-05 | global batch size: 256 | lm loss: 2.229007E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.277 | TFLOPs: 30.66 | 7: iteration 87690/ 115203 | consumed samples: 22448640 | consumed tokens: 45974814720 | elapsed time per iteration (s): 0.43 | learning rate: 4.463E-05 | global batch size: 256 | lm loss: 2.247346E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.514 | TFLOPs: 31.30 | 7: iteration 87700/ 115203 | consumed samples: 22451200 | consumed tokens: 45980057600 | elapsed time per iteration (s): 0.43 | learning rate: 4.462E-05 | global batch size: 256 | lm loss: 2.227739E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.839 | TFLOPs: 31.37 | 7: iteration 87710/ 115203 | consumed samples: 22453760 | consumed tokens: 45985300480 | elapsed time per iteration (s): 0.44 | learning rate: 4.460E-05 | global batch size: 256 | lm loss: 2.239075E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.073 | TFLOPs: 30.59 | 7: iteration 87720/ 115203 | consumed samples: 22456320 | consumed tokens: 45990543360 | elapsed time per iteration (s): 0.42 | learning rate: 4.458E-05 | global batch size: 256 | lm loss: 2.215636E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.279 | TFLOPs: 31.92 | 7: iteration 87730/ 115203 | consumed samples: 22458880 | consumed tokens: 45995786240 | elapsed time per iteration (s): 0.43 | learning rate: 4.457E-05 | global batch size: 256 | lm loss: 2.232981E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.311 | TFLOPs: 31.60 | 7: iteration 87740/ 115203 | consumed samples: 22461440 | consumed tokens: 46001029120 | elapsed time per iteration (s): 0.43 | learning rate: 4.455E-05 | global batch size: 256 | lm loss: 2.249727E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.242 | TFLOPs: 31.34 | 7: iteration 87750/ 115203 | consumed samples: 22464000 | consumed tokens: 46006272000 | elapsed time per iteration (s): 0.45 | learning rate: 4.453E-05 | global batch size: 256 | lm loss: 2.220621E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.655 | TFLOPs: 29.89 | 7: iteration 87760/ 115203 | consumed samples: 22466560 | consumed tokens: 46011514880 | elapsed time per iteration (s): 0.43 | learning rate: 4.451E-05 | global batch size: 256 | lm loss: 2.257643E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.310 | TFLOPs: 31.39 | 7: iteration 87770/ 115203 | consumed samples: 22469120 | consumed tokens: 46016757760 | elapsed time per iteration (s): 0.43 | learning rate: 4.450E-05 | global batch size: 256 | lm loss: 2.236543E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.513 | TFLOPs: 31.09 | 7: iteration 87780/ 115203 | consumed samples: 22471680 | consumed tokens: 46022000640 | elapsed time per iteration (s): 0.43 | learning rate: 4.448E-05 | global batch size: 256 | lm loss: 2.265944E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.711 | TFLOPs: 30.89 | 7: iteration 87790/ 115203 | consumed samples: 22474240 | consumed tokens: 46027243520 | elapsed time per iteration (s): 0.43 | learning rate: 4.446E-05 | global batch size: 256 | lm loss: 2.239718E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.849 | TFLOPs: 31.21 | 7: iteration 87800/ 115203 | consumed samples: 22476800 | consumed tokens: 46032486400 | elapsed time per iteration (s): 0.44 | learning rate: 4.445E-05 | global batch size: 256 | lm loss: 2.257138E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.291 | TFLOPs: 30.66 | 7: iteration 87810/ 115203 | consumed samples: 22479360 | consumed tokens: 46037729280 | elapsed time per iteration (s): 0.43 | learning rate: 4.443E-05 | global batch size: 256 | lm loss: 2.256193E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.037 | TFLOPs: 31.48 | 7: iteration 87820/ 115203 | consumed samples: 22481920 | consumed tokens: 46042972160 | elapsed time per iteration (s): 0.44 | learning rate: 4.441E-05 | global batch size: 256 | lm loss: 2.239926E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.947 | TFLOPs: 30.69 | 7: iteration 87830/ 115203 | consumed samples: 22484480 | consumed tokens: 46048215040 | elapsed time per iteration (s): 0.43 | learning rate: 4.440E-05 | global batch size: 256 | lm loss: 2.239371E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.345 | TFLOPs: 31.08 | 7: iteration 87840/ 115203 | consumed samples: 22487040 | consumed tokens: 46053457920 | elapsed time per iteration (s): 0.43 | learning rate: 4.438E-05 | global batch size: 256 | lm loss: 2.202360E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.158 | TFLOPs: 31.12 | 7: iteration 87850/ 115203 | consumed samples: 22489600 | consumed tokens: 46058700800 | elapsed time per iteration (s): 0.43 | learning rate: 4.436E-05 | global batch size: 256 | lm loss: 2.237358E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.461 | TFLOPs: 31.35 | 7: iteration 87860/ 115203 | consumed samples: 22492160 | consumed tokens: 46063943680 | elapsed time per iteration (s): 0.43 | learning rate: 4.434E-05 | global batch size: 256 | lm loss: 2.231112E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.262 | TFLOPs: 31.49 | 7: iteration 87870/ 115203 | consumed samples: 22494720 | consumed tokens: 46069186560 | elapsed time per iteration (s): 0.44 | learning rate: 4.433E-05 | global batch size: 256 | lm loss: 2.236526E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.520 | TFLOPs: 30.83 | 7: iteration 87880/ 115203 | consumed samples: 22497280 | consumed tokens: 46074429440 | elapsed time per iteration (s): 0.44 | learning rate: 4.431E-05 | global batch size: 256 | lm loss: 2.224095E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.432 | TFLOPs: 30.40 | 7: iteration 87890/ 115203 | consumed samples: 22499840 | consumed tokens: 46079672320 | elapsed time per iteration (s): 0.46 | learning rate: 4.429E-05 | global batch size: 256 | lm loss: 2.214752E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.230 | TFLOPs: 29.13 | 7: iteration 87900/ 115203 | consumed samples: 22502400 | consumed tokens: 46084915200 | elapsed time per iteration (s): 0.53 | learning rate: 4.428E-05 | global batch size: 256 | lm loss: 2.213980E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.666 | TFLOPs: 25.38 | 7: iteration 87910/ 115203 | consumed samples: 22504960 | consumed tokens: 46090158080 | elapsed time per iteration (s): 0.43 | learning rate: 4.426E-05 | global batch size: 256 | lm loss: 2.251122E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.082 | TFLOPs: 31.38 | 7: iteration 87920/ 115203 | consumed samples: 22507520 | consumed tokens: 46095400960 | elapsed time per iteration (s): 0.42 | learning rate: 4.424E-05 | global batch size: 256 | lm loss: 2.254595E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.191 | TFLOPs: 31.65 | 7: iteration 87930/ 115203 | consumed samples: 22510080 | consumed tokens: 46100643840 | elapsed time per iteration (s): 0.42 | learning rate: 4.423E-05 | global batch size: 256 | lm loss: 2.238640E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.140 | TFLOPs: 31.70 | 7: iteration 87940/ 115203 | consumed samples: 22512640 | consumed tokens: 46105886720 | elapsed time per iteration (s): 0.42 | learning rate: 4.421E-05 | global batch size: 256 | lm loss: 2.205272E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.065 | TFLOPs: 31.64 | 7: iteration 87950/ 115203 | consumed samples: 22515200 | consumed tokens: 46111129600 | elapsed time per iteration (s): 0.43 | learning rate: 4.419E-05 | global batch size: 256 | lm loss: 2.209375E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | 7: iteration 87960/ 115203 | consumed samples: 22517760 | consumed tokens: 46116372480 | elapsed time per iteration (s): 0.43 | learning rate: 4.418E-05 | global batch size: 256 | lm loss: 2.218342E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.523 | TFLOPs: 31.25 | 7: iteration 87970/ 115203 | consumed samples: 22520320 | consumed tokens: 46121615360 | elapsed time per iteration (s): 0.43 | learning rate: 4.416E-05 | global batch size: 256 | lm loss: 2.250645E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.605 | TFLOPs: 31.51 | 7: iteration 87980/ 115203 | consumed samples: 22522880 | consumed tokens: 46126858240 | elapsed time per iteration (s): 0.43 | learning rate: 4.414E-05 | global batch size: 256 | lm loss: 2.247502E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.660 | TFLOPs: 30.99 | 7: iteration 87990/ 115203 | consumed samples: 22525440 | consumed tokens: 46132101120 | elapsed time per iteration (s): 0.43 | learning rate: 4.412E-05 | global batch size: 256 | lm loss: 2.233934E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.682 | TFLOPs: 31.57 | 0: [2022-11-28 23:33:27,670] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=0, lr=[4.410744818232367e-05, 4.410744818232367e-05, 4.410744818232367e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 88000/ 115203 | consumed samples: 22528000 | consumed tokens: 46137344000 | elapsed time per iteration (s): 0.44 | learning rate: 4.411E-05 | global batch size: 256 | lm loss: 2.247201E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.381 | TFLOPs: 30.45 | 0: steps: 88000 loss: 2.2375 iter time (s): 0.430 samples/sec: 594.828 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 88000 | lm loss value: 2.197365E+00 | lm loss PPL: 9.001262E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 88000 to checkpoints_221m 0: [2022-11-28 23:33:27,907] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step88000 is begin to save! 0: [2022-11-28 23:33:27,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:33:28,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:33:28,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:33:28,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:33:28,123] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:33:28,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:33:28,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:33:28,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:33:28,188] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:33:28,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:33:28,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:33:28,255] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:33:28,255] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:33:28,287] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:33:28,288] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:33:28,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:33:28,321] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:33:28,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:33:28,353] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:33:28,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:33:28,386] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:33:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:33:28,418] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:33:28,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:33:28,449] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:33:28,481] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:33:28,482] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:33:28,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:33:28,515] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:33:28,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:33:28,548] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:33:28,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:33:28,580] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:33:28,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:33:28,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:33:28,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:33:28,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:33:28,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:33:28,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:33:28,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:33:28,682] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step88000/mp_rank_00_model_states.pt 0: [2022-11-28 23:33:28,682] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:33:28,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:33:28,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:33:28,756] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,757] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,757] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,767] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,774] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,774] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:33:28,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,777] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 6: [2022-11-28 23:33:28,777] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:33:28,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,779] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 4: [2022-11-28 23:33:28,779] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,782] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,782] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,782] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,783] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,783] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,759] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,765] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,765] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,766] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,768] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,766] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,768] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 5: [2022-11-28 23:33:28,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,771] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,769] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 23:33:28,771] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,769] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 3: [2022-11-28 23:33:28,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:33:28,773] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:33:28,773] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,785] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,785] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,785] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,786] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,786] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:33:28,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 2: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,789] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:33:28,790] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:33:28,790] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:33:28,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,794] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 7: [2022-11-28 23:33:28,794] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: [2022-11-28 23:33:28,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step88000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:33:28,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step88000 is ready now! 0: successfully saved checkpoint at iteration 88000 to checkpoints_221m 7: time (ms) | save-checkpoint: 986.78 7: iteration 88010/ 115203 | consumed samples: 22530560 | consumed tokens: 46142586880 | elapsed time per iteration (s): 0.55 | learning rate: 4.409E-05 | global batch size: 256 | lm loss: 2.198362E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 461.697 | TFLOPs: 24.22 | 7: iteration 88020/ 115203 | consumed samples: 22533120 | consumed tokens: 46147829760 | elapsed time per iteration (s): 0.43 | learning rate: 4.407E-05 | global batch size: 256 | lm loss: 2.248567E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.599 | TFLOPs: 31.46 | 7: iteration 88030/ 115203 | consumed samples: 22535680 | consumed tokens: 46153072640 | elapsed time per iteration (s): 0.42 | learning rate: 4.406E-05 | global batch size: 256 | lm loss: 2.227248E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.633 | TFLOPs: 31.67 | 7: iteration 88040/ 115203 | consumed samples: 22538240 | consumed tokens: 46158315520 | elapsed time per iteration (s): 0.43 | learning rate: 4.404E-05 | global batch size: 256 | lm loss: 2.215602E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.840 | TFLOPs: 31.47 | 7: iteration 88050/ 115203 | consumed samples: 22540800 | consumed tokens: 46163558400 | elapsed time per iteration (s): 0.43 | learning rate: 4.402E-05 | global batch size: 256 | lm loss: 2.249393E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.633 | TFLOPs: 31.09 | 7: iteration 88060/ 115203 | consumed samples: 22543360 | consumed tokens: 46168801280 | elapsed time per iteration (s): 0.43 | learning rate: 4.401E-05 | global batch size: 256 | lm loss: 2.208721E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.248 | TFLOPs: 31.18 | 7: iteration 88070/ 115203 | consumed samples: 22545920 | consumed tokens: 46174044160 | elapsed time per iteration (s): 0.43 | learning rate: 4.399E-05 | global batch size: 256 | lm loss: 2.245138E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.130 | TFLOPs: 31.23 | 7: iteration 88080/ 115203 | consumed samples: 22548480 | consumed tokens: 46179287040 | elapsed time per iteration (s): 0.43 | learning rate: 4.397E-05 | global batch size: 256 | lm loss: 2.222263E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.540 | TFLOPs: 31.35 | 7: iteration 88090/ 115203 | consumed samples: 22551040 | consumed tokens: 46184529920 | elapsed time per iteration (s): 0.43 | learning rate: 4.396E-05 | global batch size: 256 | lm loss: 2.248792E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.122 | TFLOPs: 31.12 | 7: iteration 88100/ 115203 | consumed samples: 22553600 | consumed tokens: 46189772800 | elapsed time per iteration (s): 0.43 | learning rate: 4.394E-05 | global batch size: 256 | lm loss: 2.208975E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.692 | TFLOPs: 31.10 | 7: iteration 88110/ 115203 | consumed samples: 22556160 | consumed tokens: 46195015680 | elapsed time per iteration (s): 0.43 | learning rate: 4.392E-05 | global batch size: 256 | lm loss: 2.221914E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.755 | TFLOPs: 31.21 | 7: iteration 88120/ 115203 | consumed samples: 22558720 | consumed tokens: 46200258560 | elapsed time per iteration (s): 0.42 | learning rate: 4.391E-05 | global batch size: 256 | lm loss: 2.220877E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.757 | TFLOPs: 31.89 | 7: iteration 88130/ 115203 | consumed samples: 22561280 | consumed tokens: 46205501440 | elapsed time per iteration (s): 0.44 | learning rate: 4.389E-05 | global batch size: 256 | lm loss: 2.231730E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.260 | TFLOPs: 30.71 | 7: iteration 88140/ 115203 | consumed samples: 22563840 | consumed tokens: 46210744320 | elapsed time per iteration (s): 0.43 | learning rate: 4.387E-05 | global batch size: 256 | lm loss: 2.215482E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.321 | TFLOPs: 31.03 | 7: iteration 88150/ 115203 | consumed samples: 22566400 | consumed tokens: 46215987200 | elapsed time per iteration (s): 0.43 | learning rate: 4.385E-05 | global batch size: 256 | lm loss: 2.224122E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.553 | TFLOPs: 31.09 | 7: iteration 88160/ 115203 | consumed samples: 22568960 | consumed tokens: 46221230080 | elapsed time per iteration (s): 0.44 | learning rate: 4.384E-05 | global batch size: 256 | lm loss: 2.210514E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.691 | TFLOPs: 30.84 | 7: iteration 88170/ 115203 | consumed samples: 22571520 | consumed tokens: 46226472960 | elapsed time per iteration (s): 0.43 | learning rate: 4.382E-05 | global batch size: 256 | lm loss: 2.225031E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.417 | TFLOPs: 31.14 | 7: iteration 88180/ 115203 | consumed samples: 22574080 | consumed tokens: 46231715840 | elapsed time per iteration (s): 0.43 | learning rate: 4.380E-05 | global batch size: 256 | lm loss: 2.238704E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.249 | TFLOPs: 31.49 | 7: iteration 88190/ 115203 | consumed samples: 22576640 | consumed tokens: 46236958720 | elapsed time per iteration (s): 0.44 | learning rate: 4.379E-05 | global batch size: 256 | lm loss: 2.242419E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.890 | TFLOPs: 30.58 | 7: iteration 88200/ 115203 | consumed samples: 22579200 | consumed tokens: 46242201600 | elapsed time per iteration (s): 0.42 | learning rate: 4.377E-05 | global batch size: 256 | lm loss: 2.250126E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.281 | TFLOPs: 31.86 | 7: iteration 88210/ 115203 | consumed samples: 22581760 | consumed tokens: 46247444480 | elapsed time per iteration (s): 0.43 | learning rate: 4.375E-05 | global batch size: 256 | lm loss: 2.231114E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.591 | TFLOPs: 31.30 | 7: iteration 88220/ 115203 | consumed samples: 22584320 | consumed tokens: 46252687360 | elapsed time per iteration (s): 0.45 | learning rate: 4.374E-05 | global batch size: 256 | lm loss: 2.239990E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.270 | TFLOPs: 29.76 | 7: iteration 88230/ 115203 | consumed samples: 22586880 | consumed tokens: 46257930240 | elapsed time per iteration (s): 0.43 | learning rate: 4.372E-05 | global batch size: 256 | lm loss: 2.272544E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.247 | TFLOPs: 31.13 | 7: iteration 88240/ 115203 | consumed samples: 22589440 | consumed tokens: 46263173120 | elapsed time per iteration (s): 0.42 | learning rate: 4.370E-05 | global batch size: 256 | lm loss: 2.212322E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.725 | TFLOPs: 31.73 | 7: iteration 88250/ 115203 | consumed samples: 22592000 | consumed tokens: 46268416000 | elapsed time per iteration (s): 0.44 | learning rate: 4.369E-05 | global batch size: 256 | lm loss: 2.277273E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.733 | TFLOPs: 30.78 | 7: iteration 88260/ 115203 | consumed samples: 22594560 | consumed tokens: 46273658880 | elapsed time per iteration (s): 0.42 | learning rate: 4.367E-05 | global batch size: 256 | lm loss: 2.233520E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.398 | TFLOPs: 32.08 | 7: iteration 88270/ 115203 | consumed samples: 22597120 | consumed tokens: 46278901760 | elapsed time per iteration (s): 0.43 | learning rate: 4.365E-05 | global batch size: 256 | lm loss: 2.223725E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.132 | TFLOPs: 30.96 | 7: iteration 88280/ 115203 | consumed samples: 22599680 | consumed tokens: 46284144640 | elapsed time per iteration (s): 0.43 | learning rate: 4.364E-05 | global batch size: 256 | lm loss: 2.247897E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.126 | TFLOPs: 31.44 | 7: iteration 88290/ 115203 | consumed samples: 22602240 | consumed tokens: 46289387520 | elapsed time per iteration (s): 0.43 | learning rate: 4.362E-05 | global batch size: 256 | lm loss: 2.255029E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.267 | TFLOPs: 31.18 | 7: iteration 88300/ 115203 | consumed samples: 22604800 | consumed tokens: 46294630400 | elapsed time per iteration (s): 0.43 | learning rate: 4.360E-05 | global batch size: 256 | lm loss: 2.255303E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.073 | TFLOPs: 31.43 | 7: iteration 88310/ 115203 | consumed samples: 22607360 | consumed tokens: 46299873280 | elapsed time per iteration (s): 0.43 | learning rate: 4.359E-05 | global batch size: 256 | lm loss: 2.224531E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.424 | TFLOPs: 31.45 | 7: iteration 88320/ 115203 | consumed samples: 22609920 | consumed tokens: 46305116160 | elapsed time per iteration (s): 0.43 | learning rate: 4.357E-05 | global batch size: 256 | lm loss: 2.195763E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.593 | TFLOPs: 31.35 | 7: iteration 88330/ 115203 | consumed samples: 22612480 | consumed tokens: 46310359040 | elapsed time per iteration (s): 0.44 | learning rate: 4.355E-05 | global batch size: 256 | lm loss: 2.238734E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.099 | TFLOPs: 30.75 | 7: iteration 88340/ 115203 | consumed samples: 22615040 | consumed tokens: 46315601920 | elapsed time per iteration (s): 0.43 | learning rate: 4.354E-05 | global batch size: 256 | lm loss: 2.203465E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.842 | TFLOPs: 31.47 | 7: iteration 88350/ 115203 | consumed samples: 22617600 | consumed tokens: 46320844800 | elapsed time per iteration (s): 0.43 | learning rate: 4.352E-05 | global batch size: 256 | lm loss: 2.202303E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.321 | TFLOPs: 31.45 | 7: iteration 88360/ 115203 | consumed samples: 22620160 | consumed tokens: 46326087680 | elapsed time per iteration (s): 0.43 | learning rate: 4.350E-05 | global batch size: 256 | lm loss: 2.236990E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.352 | TFLOPs: 31.60 | 7: iteration 88370/ 115203 | consumed samples: 22622720 | consumed tokens: 46331330560 | elapsed time per iteration (s): 0.44 | learning rate: 4.349E-05 | global batch size: 256 | lm loss: 2.251745E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.437 | TFLOPs: 30.35 | 7: iteration 88380/ 115203 | consumed samples: 22625280 | consumed tokens: 46336573440 | elapsed time per iteration (s): 0.43 | learning rate: 4.347E-05 | global batch size: 256 | lm loss: 2.227204E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.942 | TFLOPs: 31.01 | 7: iteration 88390/ 115203 | consumed samples: 22627840 | consumed tokens: 46341816320 | elapsed time per iteration (s): 0.43 | learning rate: 4.345E-05 | global batch size: 256 | lm loss: 2.256054E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.762 | TFLOPs: 31.36 | 7: iteration 88400/ 115203 | consumed samples: 22630400 | consumed tokens: 46347059200 | elapsed time per iteration (s): 0.43 | learning rate: 4.344E-05 | global batch size: 256 | lm loss: 2.202447E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.085 | TFLOPs: 31.43 | 7: iteration 88410/ 115203 | consumed samples: 22632960 | consumed tokens: 46352302080 | elapsed time per iteration (s): 0.43 | learning rate: 4.342E-05 | global batch size: 256 | lm loss: 2.207280E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.566 | TFLOPs: 31.04 | 7: iteration 88420/ 115203 | consumed samples: 22635520 | consumed tokens: 46357544960 | elapsed time per iteration (s): 0.43 | learning rate: 4.340E-05 | global batch size: 256 | lm loss: 2.237566E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.136 | TFLOPs: 31.28 | 7: iteration 88430/ 115203 | consumed samples: 22638080 | consumed tokens: 46362787840 | elapsed time per iteration (s): 0.43 | learning rate: 4.339E-05 | global batch size: 256 | lm loss: 2.223742E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.735 | TFLOPs: 31.47 | 7: iteration 88440/ 115203 | consumed samples: 22640640 | consumed tokens: 46368030720 | elapsed time per iteration (s): 0.43 | learning rate: 4.337E-05 | global batch size: 256 | lm loss: 2.205287E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.518 | TFLOPs: 31.46 | 7: iteration 88450/ 115203 | consumed samples: 22643200 | consumed tokens: 46373273600 | elapsed time per iteration (s): 0.43 | learning rate: 4.335E-05 | global batch size: 256 | lm loss: 2.243358E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.876 | TFLOPs: 31.16 | 7: iteration 88460/ 115203 | consumed samples: 22645760 | consumed tokens: 46378516480 | elapsed time per iteration (s): 0.42 | learning rate: 4.334E-05 | global batch size: 256 | lm loss: 2.253375E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.983 | TFLOPs: 31.64 | 7: iteration 88470/ 115203 | consumed samples: 22648320 | consumed tokens: 46383759360 | elapsed time per iteration (s): 0.43 | learning rate: 4.332E-05 | global batch size: 256 | lm loss: 2.276847E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.883 | TFLOPs: 31.11 | 7: iteration 88480/ 115203 | consumed samples: 22650880 | consumed tokens: 46389002240 | elapsed time per iteration (s): 0.44 | learning rate: 4.330E-05 | global batch size: 256 | lm loss: 2.245495E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.485 | TFLOPs: 30.35 | 7: iteration 88490/ 115203 | consumed samples: 22653440 | consumed tokens: 46394245120 | elapsed time per iteration (s): 0.43 | learning rate: 4.329E-05 | global batch size: 256 | lm loss: 2.232229E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.057 | TFLOPs: 31.17 | 7: iteration 88500/ 115203 | consumed samples: 22656000 | consumed tokens: 46399488000 | elapsed time per iteration (s): 0.44 | learning rate: 4.327E-05 | global batch size: 256 | lm loss: 2.282458E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.445 | TFLOPs: 30.77 | 7: iteration 88510/ 115203 | consumed samples: 22658560 | consumed tokens: 46404730880 | elapsed time per iteration (s): 0.44 | learning rate: 4.325E-05 | global batch size: 256 | lm loss: 2.235374E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.158 | TFLOPs: 30.86 | 7: iteration 88520/ 115203 | consumed samples: 22661120 | consumed tokens: 46409973760 | elapsed time per iteration (s): 0.43 | learning rate: 4.324E-05 | global batch size: 256 | lm loss: 2.224791E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.710 | TFLOPs: 31.05 | 7: iteration 88530/ 115203 | consumed samples: 22663680 | consumed tokens: 46415216640 | elapsed time per iteration (s): 0.42 | learning rate: 4.322E-05 | global batch size: 256 | lm loss: 2.214627E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.387 | TFLOPs: 32.03 | 7: iteration 88540/ 115203 | consumed samples: 22666240 | consumed tokens: 46420459520 | elapsed time per iteration (s): 0.43 | learning rate: 4.320E-05 | global batch size: 256 | lm loss: 2.249276E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.920 | TFLOPs: 31.00 | 7: iteration 88550/ 115203 | consumed samples: 22668800 | consumed tokens: 46425702400 | elapsed time per iteration (s): 0.43 | learning rate: 4.319E-05 | global batch size: 256 | lm loss: 2.269888E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.047 | TFLOPs: 31.33 | 7: iteration 88560/ 115203 | consumed samples: 22671360 | consumed tokens: 46430945280 | elapsed time per iteration (s): 0.43 | learning rate: 4.317E-05 | global batch size: 256 | lm loss: 2.225505E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.913 | TFLOPs: 31.42 | 7: iteration 88570/ 115203 | consumed samples: 22673920 | consumed tokens: 46436188160 | elapsed time per iteration (s): 0.42 | learning rate: 4.315E-05 | global batch size: 256 | lm loss: 2.224533E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.091 | TFLOPs: 31.75 | 7: iteration 88580/ 115203 | consumed samples: 22676480 | consumed tokens: 46441431040 | elapsed time per iteration (s): 0.44 | learning rate: 4.314E-05 | global batch size: 256 | lm loss: 2.257998E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.341 | TFLOPs: 30.40 | 7: iteration 88590/ 115203 | consumed samples: 22679040 | consumed tokens: 46446673920 | elapsed time per iteration (s): 0.44 | learning rate: 4.312E-05 | global batch size: 256 | lm loss: 2.214647E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.024 | TFLOPs: 30.80 | 7: iteration 88600/ 115203 | consumed samples: 22681600 | consumed tokens: 46451916800 | elapsed time per iteration (s): 0.44 | learning rate: 4.310E-05 | global batch size: 256 | lm loss: 2.212667E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.301 | TFLOPs: 30.81 | 7: iteration 88610/ 115203 | consumed samples: 22684160 | consumed tokens: 46457159680 | elapsed time per iteration (s): 0.43 | learning rate: 4.309E-05 | global batch size: 256 | lm loss: 2.227781E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.532 | TFLOPs: 30.93 | 7: iteration 88620/ 115203 | consumed samples: 22686720 | consumed tokens: 46462402560 | elapsed time per iteration (s): 0.43 | learning rate: 4.307E-05 | global batch size: 256 | lm loss: 2.241657E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.790 | TFLOPs: 31.31 | 7: iteration 88630/ 115203 | consumed samples: 22689280 | consumed tokens: 46467645440 | elapsed time per iteration (s): 0.43 | learning rate: 4.305E-05 | global batch size: 256 | lm loss: 2.254503E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.098 | TFLOPs: 31.07 | 7: iteration 88640/ 115203 | consumed samples: 22691840 | consumed tokens: 46472888320 | elapsed time per iteration (s): 0.43 | learning rate: 4.304E-05 | global batch size: 256 | lm loss: 2.237555E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.913 | TFLOPs: 30.90 | 7: iteration 88650/ 115203 | consumed samples: 22694400 | consumed tokens: 46478131200 | elapsed time per iteration (s): 0.42 | learning rate: 4.302E-05 | global batch size: 256 | lm loss: 2.216355E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.247 | TFLOPs: 31.76 | 7: iteration 88660/ 115203 | consumed samples: 22696960 | consumed tokens: 46483374080 | elapsed time per iteration (s): 0.43 | learning rate: 4.300E-05 | global batch size: 256 | lm loss: 2.241960E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.114 | TFLOPs: 30.91 | 7: iteration 88670/ 115203 | consumed samples: 22699520 | consumed tokens: 46488616960 | elapsed time per iteration (s): 0.42 | learning rate: 4.299E-05 | global batch size: 256 | lm loss: 2.242529E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.042 | TFLOPs: 31.90 | 7: iteration 88680/ 115203 | consumed samples: 22702080 | consumed tokens: 46493859840 | elapsed time per iteration (s): 0.44 | learning rate: 4.297E-05 | global batch size: 256 | lm loss: 2.244844E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.324 | TFLOPs: 30.76 | 7: iteration 88690/ 115203 | consumed samples: 22704640 | consumed tokens: 46499102720 | elapsed time per iteration (s): 0.43 | learning rate: 4.295E-05 | global batch size: 256 | lm loss: 2.211631E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.873 | TFLOPs: 31.00 | 7: iteration 88700/ 115203 | consumed samples: 22707200 | consumed tokens: 46504345600 | elapsed time per iteration (s): 0.42 | learning rate: 4.294E-05 | global batch size: 256 | lm loss: 2.240170E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.208 | TFLOPs: 32.02 | 7: iteration 88710/ 115203 | consumed samples: 22709760 | consumed tokens: 46509588480 | elapsed time per iteration (s): 0.44 | learning rate: 4.292E-05 | global batch size: 256 | lm loss: 2.228325E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.533 | TFLOPs: 30.20 | 7: iteration 88720/ 115203 | consumed samples: 22712320 | consumed tokens: 46514831360 | elapsed time per iteration (s): 0.43 | learning rate: 4.290E-05 | global batch size: 256 | lm loss: 2.205688E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.134 | TFLOPs: 31.17 | 7: iteration 88730/ 115203 | consumed samples: 22714880 | consumed tokens: 46520074240 | elapsed time per iteration (s): 0.42 | learning rate: 4.289E-05 | global batch size: 256 | lm loss: 2.259450E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.760 | TFLOPs: 31.68 | 7: iteration 88740/ 115203 | consumed samples: 22717440 | consumed tokens: 46525317120 | elapsed time per iteration (s): 0.43 | learning rate: 4.287E-05 | global batch size: 256 | lm loss: 2.237666E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.790 | TFLOPs: 31.42 | 7: iteration 88750/ 115203 | consumed samples: 22720000 | consumed tokens: 46530560000 | elapsed time per iteration (s): 0.43 | learning rate: 4.286E-05 | global batch size: 256 | lm loss: 2.231624E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.002 | TFLOPs: 31.11 | 7: iteration 88760/ 115203 | consumed samples: 22722560 | consumed tokens: 46535802880 | elapsed time per iteration (s): 0.44 | learning rate: 4.284E-05 | global batch size: 256 | lm loss: 2.238671E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.832 | TFLOPs: 30.74 | 7: iteration 88770/ 115203 | consumed samples: 22725120 | consumed tokens: 46541045760 | elapsed time per iteration (s): 0.43 | learning rate: 4.282E-05 | global batch size: 256 | lm loss: 2.235962E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.215 | TFLOPs: 31.18 | 7: iteration 88780/ 115203 | consumed samples: 22727680 | consumed tokens: 46546288640 | elapsed time per iteration (s): 0.42 | learning rate: 4.281E-05 | global batch size: 256 | lm loss: 2.232312E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.405 | TFLOPs: 32.03 | 7: iteration 88790/ 115203 | consumed samples: 22730240 | consumed tokens: 46551531520 | elapsed time per iteration (s): 0.43 | learning rate: 4.279E-05 | global batch size: 256 | lm loss: 2.210160E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.884 | TFLOPs: 31.11 | 7: iteration 88800/ 115203 | consumed samples: 22732800 | consumed tokens: 46556774400 | elapsed time per iteration (s): 0.42 | learning rate: 4.277E-05 | global batch size: 256 | lm loss: 2.268154E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.198 | TFLOPs: 32.07 | 7: iteration 88810/ 115203 | consumed samples: 22735360 | consumed tokens: 46562017280 | elapsed time per iteration (s): 0.43 | learning rate: 4.276E-05 | global batch size: 256 | lm loss: 2.231825E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.359 | TFLOPs: 31.29 | 7: iteration 88820/ 115203 | consumed samples: 22737920 | consumed tokens: 46567260160 | elapsed time per iteration (s): 0.44 | learning rate: 4.274E-05 | global batch size: 256 | lm loss: 2.256760E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.423 | TFLOPs: 30.35 | 7: iteration 88830/ 115203 | consumed samples: 22740480 | consumed tokens: 46572503040 | elapsed time per iteration (s): 0.42 | learning rate: 4.272E-05 | global batch size: 256 | lm loss: 2.221882E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.957 | TFLOPs: 32.06 | 7: iteration 88840/ 115203 | consumed samples: 22743040 | consumed tokens: 46577745920 | elapsed time per iteration (s): 0.43 | learning rate: 4.271E-05 | global batch size: 256 | lm loss: 2.261844E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.546 | TFLOPs: 31.56 | 7: iteration 88850/ 115203 | consumed samples: 22745600 | consumed tokens: 46582988800 | elapsed time per iteration (s): 0.43 | learning rate: 4.269E-05 | global batch size: 256 | lm loss: 2.234034E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.576 | TFLOPs: 31.30 | 7: iteration 88860/ 115203 | consumed samples: 22748160 | consumed tokens: 46588231680 | elapsed time per iteration (s): 0.42 | learning rate: 4.267E-05 | global batch size: 256 | lm loss: 2.213601E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.420 | TFLOPs: 31.61 | 7: iteration 88870/ 115203 | consumed samples: 22750720 | consumed tokens: 46593474560 | elapsed time per iteration (s): 0.66 | learning rate: 4.266E-05 | global batch size: 256 | lm loss: 2.232065E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 388.036 | TFLOPs: 20.36 | 7: iteration 88880/ 115203 | consumed samples: 22753280 | consumed tokens: 46598717440 | elapsed time per iteration (s): 0.42 | learning rate: 4.264E-05 | global batch size: 256 | lm loss: 2.214778E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.127 | TFLOPs: 31.70 | 7: iteration 88890/ 115203 | consumed samples: 22755840 | consumed tokens: 46603960320 | elapsed time per iteration (s): 0.42 | learning rate: 4.262E-05 | global batch size: 256 | lm loss: 2.257931E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.369 | TFLOPs: 31.87 | 7: iteration 88900/ 115203 | consumed samples: 22758400 | consumed tokens: 46609203200 | elapsed time per iteration (s): 0.43 | learning rate: 4.261E-05 | global batch size: 256 | lm loss: 2.257772E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.535 | TFLOPs: 31.51 | 7: iteration 88910/ 115203 | consumed samples: 22760960 | consumed tokens: 46614446080 | elapsed time per iteration (s): 0.44 | learning rate: 4.259E-05 | global batch size: 256 | lm loss: 2.221851E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.714 | TFLOPs: 30.84 | 7: iteration 88920/ 115203 | consumed samples: 22763520 | consumed tokens: 46619688960 | elapsed time per iteration (s): 0.43 | learning rate: 4.258E-05 | global batch size: 256 | lm loss: 2.237750E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.843 | TFLOPs: 31.32 | 7: iteration 88930/ 115203 | consumed samples: 22766080 | consumed tokens: 46624931840 | elapsed time per iteration (s): 0.43 | learning rate: 4.256E-05 | global batch size: 256 | lm loss: 2.225274E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.580 | TFLOPs: 31.14 | 7: iteration 88940/ 115203 | consumed samples: 22768640 | consumed tokens: 46630174720 | elapsed time per iteration (s): 0.42 | learning rate: 4.254E-05 | global batch size: 256 | lm loss: 2.227394E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.723 | TFLOPs: 31.83 | 7: iteration 88950/ 115203 | consumed samples: 22771200 | consumed tokens: 46635417600 | elapsed time per iteration (s): 0.43 | learning rate: 4.253E-05 | global batch size: 256 | lm loss: 2.255000E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.507 | TFLOPs: 31.04 | 7: iteration 88960/ 115203 | consumed samples: 22773760 | consumed tokens: 46640660480 | elapsed time per iteration (s): 0.43 | learning rate: 4.251E-05 | global batch size: 256 | lm loss: 2.221860E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.492 | TFLOPs: 31.14 | 7: iteration 88970/ 115203 | consumed samples: 22776320 | consumed tokens: 46645903360 | elapsed time per iteration (s): 0.42 | learning rate: 4.249E-05 | global batch size: 256 | lm loss: 2.227688E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.597 | TFLOPs: 31.93 | 7: iteration 88980/ 115203 | consumed samples: 22778880 | consumed tokens: 46651146240 | elapsed time per iteration (s): 0.42 | learning rate: 4.248E-05 | global batch size: 256 | lm loss: 2.207385E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.794 | TFLOPs: 31.79 | 7: iteration 88990/ 115203 | consumed samples: 22781440 | consumed tokens: 46656389120 | elapsed time per iteration (s): 0.43 | learning rate: 4.246E-05 | global batch size: 256 | lm loss: 2.233493E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.286 | TFLOPs: 31.13 | 7: iteration 89000/ 115203 | consumed samples: 22784000 | consumed tokens: 46661632000 | elapsed time per iteration (s): 0.43 | learning rate: 4.244E-05 | global batch size: 256 | lm loss: 2.195165E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.966 | TFLOPs: 31.01 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 89000 | lm loss value: 2.071434E+00 | lm loss PPL: 7.936198E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 89000 to checkpoints_221m 0: [2022-11-28 23:40:41,377] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step89000 is begin to save! 0: [2022-11-28 23:40:41,387] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:40:41,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:40:41,524] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:40:41,546] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:40:41,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:40:41,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:40:41,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:40:41,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:40:41,600] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:40:41,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:40:41,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:40:41,657] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:40:41,658] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:40:41,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:40:41,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:40:41,723] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:40:41,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:40:41,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:40:41,758] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:40:41,790] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:40:41,790] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:40:41,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:40:41,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:40:41,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:40:41,855] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:40:41,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:40:41,879] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:40:41,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:40:41,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:40:41,927] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:40:41,928] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:40:41,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:40:41,953] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:40:41,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:40:41,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:40:42,003] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:40:42,003] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:40:42,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:40:42,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:40:42,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:40:42,030] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step89000/mp_rank_00_model_states.pt 0: [2022-11-28 23:40:42,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:40:42,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:40:42,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:40:42,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,103] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,104] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,104] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,105] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,105] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,107] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,107] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 3: [2022-11-28 23:40:42,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:40:42,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 23:40:42,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 4: [2022-11-28 23:40:42,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:40:42,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:40:42,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,117] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,118] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,118] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,118] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 6: [2022-11-28 23:40:42,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:40:42,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:40:42,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:40:42,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 2: [2022-11-28 23:40:42,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:40:42,124] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:40:42,124] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,126] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,126] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,126] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 1: [2022-11-28 23:40:42,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:40:42,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:40:42,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 5: [2022-11-28 23:40:42,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:40:42,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 23:40:42,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,129] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,129] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,130] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:40:42,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:40:42,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: [2022-11-28 23:40:42,191] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step89000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:40:42,191] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step89000 is ready now! 0: successfully saved checkpoint at iteration 89000 to checkpoints_221m 7: time (ms) | save-checkpoint: 835.55 7: iteration 89010/ 115203 | consumed samples: 22786560 | consumed tokens: 46666874880 | elapsed time per iteration (s): 0.56 | learning rate: 4.243E-05 | global batch size: 256 | lm loss: 2.221478E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 459.431 | TFLOPs: 24.11 | 7: iteration 89020/ 115203 | consumed samples: 22789120 | consumed tokens: 46672117760 | elapsed time per iteration (s): 0.43 | learning rate: 4.241E-05 | global batch size: 256 | lm loss: 2.230102E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.617 | TFLOPs: 30.99 | 7: iteration 89030/ 115203 | consumed samples: 22791680 | consumed tokens: 46677360640 | elapsed time per iteration (s): 0.43 | learning rate: 4.239E-05 | global batch size: 256 | lm loss: 2.242148E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.230 | TFLOPs: 31.07 | 7: iteration 89040/ 115203 | consumed samples: 22794240 | consumed tokens: 46682603520 | elapsed time per iteration (s): 0.43 | learning rate: 4.238E-05 | global batch size: 256 | lm loss: 2.217102E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.823 | TFLOPs: 31.42 | 7: iteration 89050/ 115203 | consumed samples: 22796800 | consumed tokens: 46687846400 | elapsed time per iteration (s): 0.43 | learning rate: 4.236E-05 | global batch size: 256 | lm loss: 2.243396E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.440 | TFLOPs: 31.50 | 7: iteration 89060/ 115203 | consumed samples: 22799360 | consumed tokens: 46693089280 | elapsed time per iteration (s): 0.44 | learning rate: 4.235E-05 | global batch size: 256 | lm loss: 2.214729E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.755 | TFLOPs: 30.73 | 7: iteration 89070/ 115203 | consumed samples: 22801920 | consumed tokens: 46698332160 | elapsed time per iteration (s): 0.42 | learning rate: 4.233E-05 | global batch size: 256 | lm loss: 2.240983E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.499 | TFLOPs: 31.87 | 7: iteration 89080/ 115203 | consumed samples: 22804480 | consumed tokens: 46703575040 | elapsed time per iteration (s): 0.43 | learning rate: 4.231E-05 | global batch size: 256 | lm loss: 2.244604E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.860 | TFLOPs: 31.37 | 7: iteration 89090/ 115203 | consumed samples: 22807040 | consumed tokens: 46708817920 | elapsed time per iteration (s): 0.43 | learning rate: 4.230E-05 | global batch size: 256 | lm loss: 2.219605E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.281 | TFLOPs: 31.60 | 7: iteration 89100/ 115203 | consumed samples: 22809600 | consumed tokens: 46714060800 | elapsed time per iteration (s): 0.45 | learning rate: 4.228E-05 | global batch size: 256 | lm loss: 2.236420E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.973 | TFLOPs: 30.12 | 7: iteration 89110/ 115203 | consumed samples: 22812160 | consumed tokens: 46719303680 | elapsed time per iteration (s): 0.42 | learning rate: 4.226E-05 | global batch size: 256 | lm loss: 2.264100E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.096 | TFLOPs: 31.85 | 7: iteration 89120/ 115203 | consumed samples: 22814720 | consumed tokens: 46724546560 | elapsed time per iteration (s): 0.44 | learning rate: 4.225E-05 | global batch size: 256 | lm loss: 2.224407E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.676 | TFLOPs: 30.26 | 7: iteration 89130/ 115203 | consumed samples: 22817280 | consumed tokens: 46729789440 | elapsed time per iteration (s): 0.43 | learning rate: 4.223E-05 | global batch size: 256 | lm loss: 2.213061E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.831 | TFLOPs: 31.42 | 7: iteration 89140/ 115203 | consumed samples: 22819840 | consumed tokens: 46735032320 | elapsed time per iteration (s): 0.43 | learning rate: 4.222E-05 | global batch size: 256 | lm loss: 2.230061E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.611 | TFLOPs: 31.04 | 7: iteration 89150/ 115203 | consumed samples: 22822400 | consumed tokens: 46740275200 | elapsed time per iteration (s): 0.42 | learning rate: 4.220E-05 | global batch size: 256 | lm loss: 2.247982E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.453 | TFLOPs: 31.61 | 7: iteration 89160/ 115203 | consumed samples: 22824960 | consumed tokens: 46745518080 | elapsed time per iteration (s): 0.43 | learning rate: 4.218E-05 | global batch size: 256 | lm loss: 2.267847E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.679 | TFLOPs: 31.41 | 7: iteration 89170/ 115203 | consumed samples: 22827520 | consumed tokens: 46750760960 | elapsed time per iteration (s): 0.42 | learning rate: 4.217E-05 | global batch size: 256 | lm loss: 2.230967E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.694 | TFLOPs: 32.09 | 7: iteration 89180/ 115203 | consumed samples: 22830080 | consumed tokens: 46756003840 | elapsed time per iteration (s): 0.44 | learning rate: 4.215E-05 | global batch size: 256 | lm loss: 2.268264E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.170 | TFLOPs: 30.49 | 7: iteration 89190/ 115203 | consumed samples: 22832640 | consumed tokens: 46761246720 | elapsed time per iteration (s): 0.43 | learning rate: 4.213E-05 | global batch size: 256 | lm loss: 2.213969E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.213 | TFLOPs: 31.02 | 7: iteration 89200/ 115203 | consumed samples: 22835200 | consumed tokens: 46766489600 | elapsed time per iteration (s): 0.42 | learning rate: 4.212E-05 | global batch size: 256 | lm loss: 2.240643E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.361 | TFLOPs: 31.60 | 7: iteration 89210/ 115203 | consumed samples: 22837760 | consumed tokens: 46771732480 | elapsed time per iteration (s): 0.43 | learning rate: 4.210E-05 | global batch size: 256 | lm loss: 2.187992E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.454 | TFLOPs: 31.50 | 7: iteration 89220/ 115203 | consumed samples: 22840320 | consumed tokens: 46776975360 | elapsed time per iteration (s): 0.43 | learning rate: 4.208E-05 | global batch size: 256 | lm loss: 2.227766E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.736 | TFLOPs: 31.26 | 7: iteration 89230/ 115203 | consumed samples: 22842880 | consumed tokens: 46782218240 | elapsed time per iteration (s): 0.43 | learning rate: 4.207E-05 | global batch size: 256 | lm loss: 2.244671E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.928 | TFLOPs: 31.06 | 7: iteration 89240/ 115203 | consumed samples: 22845440 | consumed tokens: 46787461120 | elapsed time per iteration (s): 0.44 | learning rate: 4.205E-05 | global batch size: 256 | lm loss: 2.234743E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.158 | TFLOPs: 30.70 | 7: iteration 89250/ 115203 | consumed samples: 22848000 | consumed tokens: 46792704000 | elapsed time per iteration (s): 0.43 | learning rate: 4.204E-05 | global batch size: 256 | lm loss: 2.235358E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.643 | TFLOPs: 31.36 | 7: iteration 89260/ 115203 | consumed samples: 22850560 | consumed tokens: 46797946880 | elapsed time per iteration (s): 0.44 | learning rate: 4.202E-05 | global batch size: 256 | lm loss: 2.221250E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.085 | TFLOPs: 30.54 | 7: iteration 89270/ 115203 | consumed samples: 22853120 | consumed tokens: 46803189760 | elapsed time per iteration (s): 0.45 | learning rate: 4.200E-05 | global batch size: 256 | lm loss: 2.239142E+00 | grad norm: 0.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.945 | TFLOPs: 30.06 | 7: iteration 89280/ 115203 | consumed samples: 22855680 | consumed tokens: 46808432640 | elapsed time per iteration (s): 0.43 | learning rate: 4.199E-05 | global batch size: 256 | lm loss: 2.208615E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.435 | TFLOPs: 31.08 | 7: iteration 89290/ 115203 | consumed samples: 22858240 | consumed tokens: 46813675520 | elapsed time per iteration (s): 0.42 | learning rate: 4.197E-05 | global batch size: 256 | lm loss: 2.263008E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.655 | TFLOPs: 31.78 | 7: iteration 89300/ 115203 | consumed samples: 22860800 | consumed tokens: 46818918400 | elapsed time per iteration (s): 0.42 | learning rate: 4.195E-05 | global batch size: 256 | lm loss: 2.259203E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.478 | TFLOPs: 31.82 | 7: iteration 89310/ 115203 | consumed samples: 22863360 | consumed tokens: 46824161280 | elapsed time per iteration (s): 0.43 | learning rate: 4.194E-05 | global batch size: 256 | lm loss: 2.226212E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.407 | TFLOPs: 31.50 | 7: iteration 89320/ 115203 | consumed samples: 22865920 | consumed tokens: 46829404160 | elapsed time per iteration (s): 0.43 | learning rate: 4.192E-05 | global batch size: 256 | lm loss: 2.280636E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.751 | TFLOPs: 31.36 | 7: iteration 89330/ 115203 | consumed samples: 22868480 | consumed tokens: 46834647040 | elapsed time per iteration (s): 0.43 | learning rate: 4.191E-05 | global batch size: 256 | lm loss: 2.246375E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.550 | TFLOPs: 31.46 | 7: iteration 89340/ 115203 | consumed samples: 22871040 | consumed tokens: 46839889920 | elapsed time per iteration (s): 0.43 | learning rate: 4.189E-05 | global batch size: 256 | lm loss: 2.226149E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.684 | TFLOPs: 31.20 | 7: iteration 89350/ 115203 | consumed samples: 22873600 | consumed tokens: 46845132800 | elapsed time per iteration (s): 0.42 | learning rate: 4.187E-05 | global batch size: 256 | lm loss: 2.249395E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.793 | TFLOPs: 31.89 | 7: iteration 89360/ 115203 | consumed samples: 22876160 | consumed tokens: 46850375680 | elapsed time per iteration (s): 0.42 | learning rate: 4.186E-05 | global batch size: 256 | lm loss: 2.209448E+00 | grad norm: 0.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.642 | TFLOPs: 31.62 | 7: iteration 89370/ 115203 | consumed samples: 22878720 | consumed tokens: 46855618560 | elapsed time per iteration (s): 0.43 | learning rate: 4.184E-05 | global batch size: 256 | lm loss: 2.233413E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.147 | TFLOPs: 31.59 | 7: iteration 89380/ 115203 | consumed samples: 22881280 | consumed tokens: 46860861440 | elapsed time per iteration (s): 0.43 | learning rate: 4.183E-05 | global batch size: 256 | lm loss: 2.207898E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.528 | TFLOPs: 31.19 | 7: iteration 89390/ 115203 | consumed samples: 22883840 | consumed tokens: 46866104320 | elapsed time per iteration (s): 0.43 | learning rate: 4.181E-05 | global batch size: 256 | lm loss: 2.243319E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.364 | TFLOPs: 31.40 | 7: iteration 89400/ 115203 | consumed samples: 22886400 | consumed tokens: 46871347200 | elapsed time per iteration (s): 0.48 | learning rate: 4.179E-05 | global batch size: 256 | lm loss: 2.251934E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 535.948 | TFLOPs: 28.12 | 7: iteration 89410/ 115203 | consumed samples: 22888960 | consumed tokens: 46876590080 | elapsed time per iteration (s): 0.42 | learning rate: 4.178E-05 | global batch size: 256 | lm loss: 2.223213E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.949 | TFLOPs: 31.74 | 7: iteration 89420/ 115203 | consumed samples: 22891520 | consumed tokens: 46881832960 | elapsed time per iteration (s): 0.43 | learning rate: 4.176E-05 | global batch size: 256 | lm loss: 2.259532E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.050 | TFLOPs: 31.54 | 7: iteration 89430/ 115203 | consumed samples: 22894080 | consumed tokens: 46887075840 | elapsed time per iteration (s): 0.43 | learning rate: 4.174E-05 | global batch size: 256 | lm loss: 2.262140E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.204 | TFLOPs: 30.91 | 7: iteration 89440/ 115203 | consumed samples: 22896640 | consumed tokens: 46892318720 | elapsed time per iteration (s): 0.42 | learning rate: 4.173E-05 | global batch size: 256 | lm loss: 2.256837E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.753 | TFLOPs: 31.89 | 7: iteration 89450/ 115203 | consumed samples: 22899200 | consumed tokens: 46897561600 | elapsed time per iteration (s): 0.44 | learning rate: 4.171E-05 | global batch size: 256 | lm loss: 2.226642E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.235 | TFLOPs: 30.71 | 7: iteration 89460/ 115203 | consumed samples: 22901760 | consumed tokens: 46902804480 | elapsed time per iteration (s): 0.42 | learning rate: 4.170E-05 | global batch size: 256 | lm loss: 2.211891E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.656 | TFLOPs: 31.94 | 7: iteration 89470/ 115203 | consumed samples: 22904320 | consumed tokens: 46908047360 | elapsed time per iteration (s): 1.00 | learning rate: 4.168E-05 | global batch size: 256 | lm loss: 2.241567E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 255.644 | TFLOPs: 13.41 | 7: iteration 89480/ 115203 | consumed samples: 22906880 | consumed tokens: 46913290240 | elapsed time per iteration (s): 0.45 | learning rate: 4.166E-05 | global batch size: 256 | lm loss: 2.225716E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.217 | TFLOPs: 29.81 | 7: iteration 89490/ 115203 | consumed samples: 22909440 | consumed tokens: 46918533120 | elapsed time per iteration (s): 0.91 | learning rate: 4.165E-05 | global batch size: 256 | lm loss: 2.264777E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 282.773 | TFLOPs: 14.84 | 7: iteration 89500/ 115203 | consumed samples: 22912000 | consumed tokens: 46923776000 | elapsed time per iteration (s): 0.44 | learning rate: 4.163E-05 | global batch size: 256 | lm loss: 2.215816E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.429 | TFLOPs: 30.61 | 7: iteration 89510/ 115203 | consumed samples: 22914560 | consumed tokens: 46929018880 | elapsed time per iteration (s): 0.44 | learning rate: 4.162E-05 | global batch size: 256 | lm loss: 2.253940E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.722 | TFLOPs: 30.73 | 7: iteration 89520/ 115203 | consumed samples: 22917120 | consumed tokens: 46934261760 | elapsed time per iteration (s): 0.44 | learning rate: 4.160E-05 | global batch size: 256 | lm loss: 2.261515E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.286 | TFLOPs: 30.66 | 7: iteration 89530/ 115203 | consumed samples: 22919680 | consumed tokens: 46939504640 | elapsed time per iteration (s): 0.44 | learning rate: 4.158E-05 | global batch size: 256 | lm loss: 2.272455E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.804 | TFLOPs: 30.79 | 7: iteration 89540/ 115203 | consumed samples: 22922240 | consumed tokens: 46944747520 | elapsed time per iteration (s): 0.43 | learning rate: 4.157E-05 | global batch size: 256 | lm loss: 2.266471E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.894 | TFLOPs: 31.27 | 7: iteration 89550/ 115203 | consumed samples: 22924800 | consumed tokens: 46949990400 | elapsed time per iteration (s): 0.44 | learning rate: 4.155E-05 | global batch size: 256 | lm loss: 2.233558E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.733 | TFLOPs: 30.78 | 7: iteration 89560/ 115203 | consumed samples: 22927360 | consumed tokens: 46955233280 | elapsed time per iteration (s): 0.43 | learning rate: 4.153E-05 | global batch size: 256 | lm loss: 2.229581E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.961 | TFLOPs: 31.27 | 7: iteration 89570/ 115203 | consumed samples: 22929920 | consumed tokens: 46960476160 | elapsed time per iteration (s): 0.43 | learning rate: 4.152E-05 | global batch size: 256 | lm loss: 2.248190E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.352 | TFLOPs: 31.24 | 7: iteration 89580/ 115203 | consumed samples: 22932480 | consumed tokens: 46965719040 | elapsed time per iteration (s): 0.42 | learning rate: 4.150E-05 | global batch size: 256 | lm loss: 2.262656E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.744 | TFLOPs: 31.94 | 7: iteration 89590/ 115203 | consumed samples: 22935040 | consumed tokens: 46970961920 | elapsed time per iteration (s): 0.43 | learning rate: 4.149E-05 | global batch size: 256 | lm loss: 2.242504E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.093 | TFLOPs: 31.59 | 7: iteration 89600/ 115203 | consumed samples: 22937600 | consumed tokens: 46976204800 | elapsed time per iteration (s): 0.44 | learning rate: 4.147E-05 | global batch size: 256 | lm loss: 2.260133E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.102 | TFLOPs: 30.59 | 7: iteration 89610/ 115203 | consumed samples: 22940160 | consumed tokens: 46981447680 | elapsed time per iteration (s): 0.44 | learning rate: 4.145E-05 | global batch size: 256 | lm loss: 2.247558E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.994 | TFLOPs: 30.85 | 7: iteration 89620/ 115203 | consumed samples: 22942720 | consumed tokens: 46986690560 | elapsed time per iteration (s): 0.43 | learning rate: 4.144E-05 | global batch size: 256 | lm loss: 2.257043E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.501 | TFLOPs: 31.35 | 7: iteration 89630/ 115203 | consumed samples: 22945280 | consumed tokens: 46991933440 | elapsed time per iteration (s): 0.43 | learning rate: 4.142E-05 | global batch size: 256 | lm loss: 2.234918E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.979 | TFLOPs: 31.32 | 7: iteration 89640/ 115203 | consumed samples: 22947840 | consumed tokens: 46997176320 | elapsed time per iteration (s): 0.43 | learning rate: 4.141E-05 | global batch size: 256 | lm loss: 2.212656E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.926 | TFLOPs: 30.95 | 7: iteration 89650/ 115203 | consumed samples: 22950400 | consumed tokens: 47002419200 | elapsed time per iteration (s): 0.43 | learning rate: 4.139E-05 | global batch size: 256 | lm loss: 2.226119E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.591 | TFLOPs: 31.09 | 7: iteration 89660/ 115203 | consumed samples: 22952960 | consumed tokens: 47007662080 | elapsed time per iteration (s): 0.44 | learning rate: 4.137E-05 | global batch size: 256 | lm loss: 2.230923E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.308 | TFLOPs: 30.76 | 7: iteration 89670/ 115203 | consumed samples: 22955520 | consumed tokens: 47012904960 | elapsed time per iteration (s): 0.43 | learning rate: 4.136E-05 | global batch size: 256 | lm loss: 2.198985E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.775 | TFLOPs: 30.94 | 7: iteration 89680/ 115203 | consumed samples: 22958080 | consumed tokens: 47018147840 | elapsed time per iteration (s): 0.43 | learning rate: 4.134E-05 | global batch size: 256 | lm loss: 2.249588E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.090 | TFLOPs: 31.07 | 7: iteration 89690/ 115203 | consumed samples: 22960640 | consumed tokens: 47023390720 | elapsed time per iteration (s): 0.42 | learning rate: 4.133E-05 | global batch size: 256 | lm loss: 2.247475E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.775 | TFLOPs: 31.63 | 7: iteration 89700/ 115203 | consumed samples: 22963200 | consumed tokens: 47028633600 | elapsed time per iteration (s): 0.43 | learning rate: 4.131E-05 | global batch size: 256 | lm loss: 2.228675E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.133 | TFLOPs: 31.07 | 7: iteration 89710/ 115203 | consumed samples: 22965760 | consumed tokens: 47033876480 | elapsed time per iteration (s): 0.43 | learning rate: 4.129E-05 | global batch size: 256 | lm loss: 2.245022E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.795 | TFLOPs: 30.89 | 7: iteration 89720/ 115203 | consumed samples: 22968320 | consumed tokens: 47039119360 | elapsed time per iteration (s): 0.43 | learning rate: 4.128E-05 | global batch size: 256 | lm loss: 2.216665E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.759 | TFLOPs: 31.21 | 7: iteration 89730/ 115203 | consumed samples: 22970880 | consumed tokens: 47044362240 | elapsed time per iteration (s): 0.43 | learning rate: 4.126E-05 | global batch size: 256 | lm loss: 2.228412E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.862 | TFLOPs: 31.00 | 7: iteration 89740/ 115203 | consumed samples: 22973440 | consumed tokens: 47049605120 | elapsed time per iteration (s): 0.43 | learning rate: 4.125E-05 | global batch size: 256 | lm loss: 2.267901E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.895 | TFLOPs: 31.21 | 7: iteration 89750/ 115203 | consumed samples: 22976000 | consumed tokens: 47054848000 | elapsed time per iteration (s): 0.43 | learning rate: 4.123E-05 | global batch size: 256 | lm loss: 2.219741E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.191 | TFLOPs: 31.28 | 7: iteration 89760/ 115203 | consumed samples: 22978560 | consumed tokens: 47060090880 | elapsed time per iteration (s): 0.43 | learning rate: 4.121E-05 | global batch size: 256 | lm loss: 2.252197E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.012 | TFLOPs: 30.90 | 7: iteration 89770/ 115203 | consumed samples: 22981120 | consumed tokens: 47065333760 | elapsed time per iteration (s): 0.43 | learning rate: 4.120E-05 | global batch size: 256 | lm loss: 2.238723E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.097 | TFLOPs: 30.91 | 7: iteration 89780/ 115203 | consumed samples: 22983680 | consumed tokens: 47070576640 | elapsed time per iteration (s): 0.43 | learning rate: 4.118E-05 | global batch size: 256 | lm loss: 2.203283E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.751 | TFLOPs: 31.31 | 7: iteration 89790/ 115203 | consumed samples: 22986240 | consumed tokens: 47075819520 | elapsed time per iteration (s): 0.45 | learning rate: 4.117E-05 | global batch size: 256 | lm loss: 2.215161E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.689 | TFLOPs: 29.94 | 7: iteration 89800/ 115203 | consumed samples: 22988800 | consumed tokens: 47081062400 | elapsed time per iteration (s): 0.43 | learning rate: 4.115E-05 | global batch size: 256 | lm loss: 2.239336E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.387 | TFLOPs: 31.19 | 7: iteration 89810/ 115203 | consumed samples: 22991360 | consumed tokens: 47086305280 | elapsed time per iteration (s): 0.44 | learning rate: 4.113E-05 | global batch size: 256 | lm loss: 2.247161E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.717 | TFLOPs: 30.78 | 7: iteration 89820/ 115203 | consumed samples: 22993920 | consumed tokens: 47091548160 | elapsed time per iteration (s): 0.43 | learning rate: 4.112E-05 | global batch size: 256 | lm loss: 2.187915E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.823 | TFLOPs: 30.95 | 7: iteration 89830/ 115203 | consumed samples: 22996480 | consumed tokens: 47096791040 | elapsed time per iteration (s): 0.43 | learning rate: 4.110E-05 | global batch size: 256 | lm loss: 2.255669E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.621 | TFLOPs: 31.46 | 7: iteration 89840/ 115203 | consumed samples: 22999040 | consumed tokens: 47102033920 | elapsed time per iteration (s): 0.43 | learning rate: 4.109E-05 | global batch size: 256 | lm loss: 2.244923E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.397 | TFLOPs: 31.19 | 7: iteration 89850/ 115203 | consumed samples: 23001600 | consumed tokens: 47107276800 | elapsed time per iteration (s): 0.42 | learning rate: 4.107E-05 | global batch size: 256 | lm loss: 2.220634E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.089 | TFLOPs: 31.64 | 7: iteration 89860/ 115203 | consumed samples: 23004160 | consumed tokens: 47112519680 | elapsed time per iteration (s): 0.68 | learning rate: 4.105E-05 | global batch size: 256 | lm loss: 2.209904E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 373.998 | TFLOPs: 19.62 | 7: iteration 89870/ 115203 | consumed samples: 23006720 | consumed tokens: 47117762560 | elapsed time per iteration (s): 0.43 | learning rate: 4.104E-05 | global batch size: 256 | lm loss: 2.223724E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.586 | TFLOPs: 31.14 | 7: iteration 89880/ 115203 | consumed samples: 23009280 | consumed tokens: 47123005440 | elapsed time per iteration (s): 0.43 | learning rate: 4.102E-05 | global batch size: 256 | lm loss: 2.215120E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.354 | TFLOPs: 31.45 | 7: iteration 89890/ 115203 | consumed samples: 23011840 | consumed tokens: 47128248320 | elapsed time per iteration (s): 0.43 | learning rate: 4.101E-05 | global batch size: 256 | lm loss: 2.179197E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.242 | TFLOPs: 31.13 | 7: iteration 89900/ 115203 | consumed samples: 23014400 | consumed tokens: 47133491200 | elapsed time per iteration (s): 0.44 | learning rate: 4.099E-05 | global batch size: 256 | lm loss: 2.198542E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.864 | TFLOPs: 30.84 | 7: iteration 89910/ 115203 | consumed samples: 23016960 | consumed tokens: 47138734080 | elapsed time per iteration (s): 0.43 | learning rate: 4.097E-05 | global batch size: 256 | lm loss: 2.237460E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.008 | TFLOPs: 31.22 | 7: iteration 89920/ 115203 | consumed samples: 23019520 | consumed tokens: 47143976960 | elapsed time per iteration (s): 0.44 | learning rate: 4.096E-05 | global batch size: 256 | lm loss: 2.236080E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.925 | TFLOPs: 30.59 | 7: iteration 89930/ 115203 | consumed samples: 23022080 | consumed tokens: 47149219840 | elapsed time per iteration (s): 0.45 | learning rate: 4.094E-05 | global batch size: 256 | lm loss: 2.259135E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.612 | TFLOPs: 30.10 | 7: iteration 89940/ 115203 | consumed samples: 23024640 | consumed tokens: 47154462720 | elapsed time per iteration (s): 0.43 | learning rate: 4.093E-05 | global batch size: 256 | lm loss: 2.232662E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.578 | TFLOPs: 30.93 | 7: iteration 89950/ 115203 | consumed samples: 23027200 | consumed tokens: 47159705600 | elapsed time per iteration (s): 0.44 | learning rate: 4.091E-05 | global batch size: 256 | lm loss: 2.245263E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.062 | TFLOPs: 30.28 | 7: iteration 89960/ 115203 | consumed samples: 23029760 | consumed tokens: 47164948480 | elapsed time per iteration (s): 0.43 | learning rate: 4.090E-05 | global batch size: 256 | lm loss: 2.240268E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.775 | TFLOPs: 31.00 | 7: iteration 89970/ 115203 | consumed samples: 23032320 | consumed tokens: 47170191360 | elapsed time per iteration (s): 0.44 | learning rate: 4.088E-05 | global batch size: 256 | lm loss: 2.235559E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.211 | TFLOPs: 30.86 | 7: iteration 89980/ 115203 | consumed samples: 23034880 | consumed tokens: 47175434240 | elapsed time per iteration (s): 0.44 | learning rate: 4.086E-05 | global batch size: 256 | lm loss: 2.233259E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.312 | TFLOPs: 30.71 | 7: iteration 89990/ 115203 | consumed samples: 23037440 | consumed tokens: 47180677120 | elapsed time per iteration (s): 0.43 | learning rate: 4.085E-05 | global batch size: 256 | lm loss: 2.223053E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.584 | TFLOPs: 30.93 | 0: [2022-11-28 23:48:07,453] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=0, lr=[4.083185080977982e-05, 4.083185080977982e-05, 4.083185080977982e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 90000/ 115203 | consumed samples: 23040000 | consumed tokens: 47185920000 | elapsed time per iteration (s): 0.43 | learning rate: 4.083E-05 | global batch size: 256 | lm loss: 2.238873E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.547 | TFLOPs: 31.30 | 0: steps: 90000 loss: 2.1885 iter time (s): 0.437 samples/sec: 585.802 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 90000 | lm loss value: 2.120207E+00 | lm loss PPL: 8.332863E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 90000 to checkpoints_221m 0: [2022-11-28 23:48:07,693] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step90000 is begin to save! 0: [2022-11-28 23:48:07,703] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:48:07,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:48:07,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:48:07,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:48:07,840] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:48:07,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:48:07,865] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:48:07,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:48:07,889] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:48:07,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:48:07,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:48:07,936] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:48:07,936] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:48:07,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:48:07,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:48:07,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:48:07,984] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:48:08,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:48:08,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:48:08,038] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:48:08,038] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:48:08,063] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:48:08,063] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:48:08,087] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:48:08,087] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:48:08,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:48:08,111] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:48:08,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:48:08,136] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:48:08,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:48:08,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:48:08,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:48:08,183] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:48:08,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:48:08,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:48:08,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:48:08,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:48:08,256] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:48:08,256] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:48:08,261] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:48:08,262] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step90000/mp_rank_00_model_states.pt 0: [2022-11-28 23:48:08,262] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:48:08,264] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:48:08,282] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:48:08,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,340] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,339] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,340] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,339] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,341] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,341] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,341] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:48:08,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 1: [2022-11-28 23:48:08,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:48:08,352] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:48:08,352] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,342] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,342] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 5: [2022-11-28 23:48:08,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,349] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:48:08,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 3: [2022-11-28 23:48:08,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:48:08,353] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-28 23:48:08,353] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,354] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,354] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,354] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,355] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:48:08,355] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,356] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: [2022-11-28 23:48:08,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:48:08,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:48:08,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 2: [2022-11-28 23:48:08,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 4: [2022-11-28 23:48:08,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 7: [2022-11-28 23:48:08,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step90000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:48:08,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step90000 is ready now! 0: successfully saved checkpoint at iteration 90000 to checkpoints_221m 7: time (ms) | save-checkpoint: 854.75 7: iteration 90010/ 115203 | consumed samples: 23042560 | consumed tokens: 47191162880 | elapsed time per iteration (s): 0.53 | learning rate: 4.082E-05 | global batch size: 256 | lm loss: 2.184754E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 479.108 | TFLOPs: 25.14 | 7: iteration 90020/ 115203 | consumed samples: 23045120 | consumed tokens: 47196405760 | elapsed time per iteration (s): 0.44 | learning rate: 4.080E-05 | global batch size: 256 | lm loss: 2.221985E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.314 | TFLOPs: 30.87 | 7: iteration 90030/ 115203 | consumed samples: 23047680 | consumed tokens: 47201648640 | elapsed time per iteration (s): 0.44 | learning rate: 4.078E-05 | global batch size: 256 | lm loss: 2.248189E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.082 | TFLOPs: 30.86 | 7: iteration 90040/ 115203 | consumed samples: 23050240 | consumed tokens: 47206891520 | elapsed time per iteration (s): 0.43 | learning rate: 4.077E-05 | global batch size: 256 | lm loss: 2.221788E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.963 | TFLOPs: 31.11 | 7: iteration 90050/ 115203 | consumed samples: 23052800 | consumed tokens: 47212134400 | elapsed time per iteration (s): 0.43 | learning rate: 4.075E-05 | global batch size: 256 | lm loss: 2.217791E+00 | grad norm: 0.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.572 | TFLOPs: 31.20 | 7: iteration 90060/ 115203 | consumed samples: 23055360 | consumed tokens: 47217377280 | elapsed time per iteration (s): 0.43 | learning rate: 4.074E-05 | global batch size: 256 | lm loss: 2.224205E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.982 | TFLOPs: 30.96 | 7: iteration 90070/ 115203 | consumed samples: 23057920 | consumed tokens: 47222620160 | elapsed time per iteration (s): 0.44 | learning rate: 4.072E-05 | global batch size: 256 | lm loss: 2.234940E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.523 | TFLOPs: 30.62 | 7: iteration 90080/ 115203 | consumed samples: 23060480 | consumed tokens: 47227863040 | elapsed time per iteration (s): 0.43 | learning rate: 4.071E-05 | global batch size: 256 | lm loss: 2.218212E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.713 | TFLOPs: 31.15 | 7: iteration 90090/ 115203 | consumed samples: 23063040 | consumed tokens: 47233105920 | elapsed time per iteration (s): 0.44 | learning rate: 4.069E-05 | global batch size: 256 | lm loss: 2.237802E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.525 | TFLOPs: 30.46 | 7: iteration 90100/ 115203 | consumed samples: 23065600 | consumed tokens: 47238348800 | elapsed time per iteration (s): 0.43 | learning rate: 4.067E-05 | global batch size: 256 | lm loss: 2.204970E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.361 | TFLOPs: 30.92 | 7: iteration 90110/ 115203 | consumed samples: 23068160 | consumed tokens: 47243591680 | elapsed time per iteration (s): 0.44 | learning rate: 4.066E-05 | global batch size: 256 | lm loss: 2.235961E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.884 | TFLOPs: 30.43 | 7: iteration 90120/ 115203 | consumed samples: 23070720 | consumed tokens: 47248834560 | elapsed time per iteration (s): 0.42 | learning rate: 4.064E-05 | global batch size: 256 | lm loss: 2.264345E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.701 | TFLOPs: 31.73 | 7: iteration 90130/ 115203 | consumed samples: 23073280 | consumed tokens: 47254077440 | elapsed time per iteration (s): 0.44 | learning rate: 4.063E-05 | global batch size: 256 | lm loss: 2.218811E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.797 | TFLOPs: 30.53 | 7: iteration 90140/ 115203 | consumed samples: 23075840 | consumed tokens: 47259320320 | elapsed time per iteration (s): 0.45 | learning rate: 4.061E-05 | global batch size: 256 | lm loss: 2.244006E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.266 | TFLOPs: 30.18 | 7: iteration 90150/ 115203 | consumed samples: 23078400 | consumed tokens: 47264563200 | elapsed time per iteration (s): 0.45 | learning rate: 4.059E-05 | global batch size: 256 | lm loss: 2.215847E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.638 | TFLOPs: 30.10 | 7: iteration 90160/ 115203 | consumed samples: 23080960 | consumed tokens: 47269806080 | elapsed time per iteration (s): 0.45 | learning rate: 4.058E-05 | global batch size: 256 | lm loss: 2.229075E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.095 | TFLOPs: 29.60 | 7: iteration 90170/ 115203 | consumed samples: 23083520 | consumed tokens: 47275048960 | elapsed time per iteration (s): 0.43 | learning rate: 4.056E-05 | global batch size: 256 | lm loss: 2.239245E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.135 | TFLOPs: 31.33 | 7: iteration 90180/ 115203 | consumed samples: 23086080 | consumed tokens: 47280291840 | elapsed time per iteration (s): 0.43 | learning rate: 4.055E-05 | global batch size: 256 | lm loss: 2.232487E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.060 | TFLOPs: 31.01 | 7: iteration 90190/ 115203 | consumed samples: 23088640 | consumed tokens: 47285534720 | elapsed time per iteration (s): 0.44 | learning rate: 4.053E-05 | global batch size: 256 | lm loss: 2.220437E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.199 | TFLOPs: 30.70 | 7: iteration 90200/ 115203 | consumed samples: 23091200 | consumed tokens: 47290777600 | elapsed time per iteration (s): 0.43 | learning rate: 4.052E-05 | global batch size: 256 | lm loss: 2.194860E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.059 | TFLOPs: 30.91 | 7: iteration 90210/ 115203 | consumed samples: 23093760 | consumed tokens: 47296020480 | elapsed time per iteration (s): 0.44 | learning rate: 4.050E-05 | global batch size: 256 | lm loss: 2.223314E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.772 | TFLOPs: 30.42 | 7: iteration 90220/ 115203 | consumed samples: 23096320 | consumed tokens: 47301263360 | elapsed time per iteration (s): 0.43 | learning rate: 4.048E-05 | global batch size: 256 | lm loss: 2.236163E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.409 | TFLOPs: 31.03 | 7: iteration 90230/ 115203 | consumed samples: 23098880 | consumed tokens: 47306506240 | elapsed time per iteration (s): 0.44 | learning rate: 4.047E-05 | global batch size: 256 | lm loss: 2.245246E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.658 | TFLOPs: 30.83 | 7: iteration 90240/ 115203 | consumed samples: 23101440 | consumed tokens: 47311749120 | elapsed time per iteration (s): 0.43 | learning rate: 4.045E-05 | global batch size: 256 | lm loss: 2.224305E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.075 | TFLOPs: 31.12 | 7: iteration 90250/ 115203 | consumed samples: 23104000 | consumed tokens: 47316992000 | elapsed time per iteration (s): 0.42 | learning rate: 4.044E-05 | global batch size: 256 | lm loss: 2.219084E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.236 | TFLOPs: 31.86 | 7: iteration 90260/ 115203 | consumed samples: 23106560 | consumed tokens: 47322234880 | elapsed time per iteration (s): 0.44 | learning rate: 4.042E-05 | global batch size: 256 | lm loss: 2.231612E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.198 | TFLOPs: 30.55 | 7: iteration 90270/ 115203 | consumed samples: 23109120 | consumed tokens: 47327477760 | elapsed time per iteration (s): 0.43 | learning rate: 4.041E-05 | global batch size: 256 | lm loss: 2.231616E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.372 | TFLOPs: 30.92 | 7: iteration 90280/ 115203 | consumed samples: 23111680 | consumed tokens: 47332720640 | elapsed time per iteration (s): 0.44 | learning rate: 4.039E-05 | global batch size: 256 | lm loss: 2.218764E+00 | grad norm: 0.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.365 | TFLOPs: 30.77 | 7: iteration 90290/ 115203 | consumed samples: 23114240 | consumed tokens: 47337963520 | elapsed time per iteration (s): 0.44 | learning rate: 4.037E-05 | global batch size: 256 | lm loss: 2.216187E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.370 | TFLOPs: 30.87 | 7: iteration 90300/ 115203 | consumed samples: 23116800 | consumed tokens: 47343206400 | elapsed time per iteration (s): 0.45 | learning rate: 4.036E-05 | global batch size: 256 | lm loss: 2.239838E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.752 | TFLOPs: 29.95 | 7: iteration 90310/ 115203 | consumed samples: 23119360 | consumed tokens: 47348449280 | elapsed time per iteration (s): 0.43 | learning rate: 4.034E-05 | global batch size: 256 | lm loss: 2.220941E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.912 | TFLOPs: 30.95 | 7: iteration 90320/ 115203 | consumed samples: 23121920 | consumed tokens: 47353692160 | elapsed time per iteration (s): 0.44 | learning rate: 4.033E-05 | global batch size: 256 | lm loss: 2.201669E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.970 | TFLOPs: 30.33 | 7: iteration 90330/ 115203 | consumed samples: 23124480 | consumed tokens: 47358935040 | elapsed time per iteration (s): 0.45 | learning rate: 4.031E-05 | global batch size: 256 | lm loss: 2.202610E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.113 | TFLOPs: 29.60 | 7: iteration 90340/ 115203 | consumed samples: 23127040 | consumed tokens: 47364177920 | elapsed time per iteration (s): 0.43 | learning rate: 4.030E-05 | global batch size: 256 | lm loss: 2.251425E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.028 | TFLOPs: 31.12 | 7: iteration 90350/ 115203 | consumed samples: 23129600 | consumed tokens: 47369420800 | elapsed time per iteration (s): 0.43 | learning rate: 4.028E-05 | global batch size: 256 | lm loss: 2.197767E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.854 | TFLOPs: 31.26 | 7: iteration 90360/ 115203 | consumed samples: 23132160 | consumed tokens: 47374663680 | elapsed time per iteration (s): 0.45 | learning rate: 4.026E-05 | global batch size: 256 | lm loss: 2.249469E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.522 | TFLOPs: 30.04 | 7: iteration 90370/ 115203 | consumed samples: 23134720 | consumed tokens: 47379906560 | elapsed time per iteration (s): 0.43 | learning rate: 4.025E-05 | global batch size: 256 | lm loss: 2.222186E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.152 | TFLOPs: 30.96 | 7: iteration 90380/ 115203 | consumed samples: 23137280 | consumed tokens: 47385149440 | elapsed time per iteration (s): 0.44 | learning rate: 4.023E-05 | global batch size: 256 | lm loss: 2.216252E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.104 | TFLOPs: 30.86 | 7: iteration 90390/ 115203 | consumed samples: 23139840 | consumed tokens: 47390392320 | elapsed time per iteration (s): 0.45 | learning rate: 4.022E-05 | global batch size: 256 | lm loss: 2.254629E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.153 | TFLOPs: 30.07 | 7: iteration 90400/ 115203 | consumed samples: 23142400 | consumed tokens: 47395635200 | elapsed time per iteration (s): 0.44 | learning rate: 4.020E-05 | global batch size: 256 | lm loss: 2.234856E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.161 | TFLOPs: 30.70 | 7: iteration 90410/ 115203 | consumed samples: 23144960 | consumed tokens: 47400878080 | elapsed time per iteration (s): 0.45 | learning rate: 4.019E-05 | global batch size: 256 | lm loss: 2.201353E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.942 | TFLOPs: 29.96 | 7: iteration 90420/ 115203 | consumed samples: 23147520 | consumed tokens: 47406120960 | elapsed time per iteration (s): 0.44 | learning rate: 4.017E-05 | global batch size: 256 | lm loss: 2.242558E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.804 | TFLOPs: 30.68 | 7: iteration 90430/ 115203 | consumed samples: 23150080 | consumed tokens: 47411363840 | elapsed time per iteration (s): 0.46 | learning rate: 4.015E-05 | global batch size: 256 | lm loss: 2.213622E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.343 | TFLOPs: 29.45 | 7: iteration 90440/ 115203 | consumed samples: 23152640 | consumed tokens: 47416606720 | elapsed time per iteration (s): 0.44 | learning rate: 4.014E-05 | global batch size: 256 | lm loss: 2.231518E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.101 | TFLOPs: 30.86 | 7: iteration 90450/ 115203 | consumed samples: 23155200 | consumed tokens: 47421849600 | elapsed time per iteration (s): 0.42 | learning rate: 4.012E-05 | global batch size: 256 | lm loss: 2.246066E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.708 | TFLOPs: 31.68 | 7: iteration 90460/ 115203 | consumed samples: 23157760 | consumed tokens: 47427092480 | elapsed time per iteration (s): 0.45 | learning rate: 4.011E-05 | global batch size: 256 | lm loss: 2.235069E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.280 | TFLOPs: 29.55 | 7: iteration 90470/ 115203 | consumed samples: 23160320 | consumed tokens: 47432335360 | elapsed time per iteration (s): 0.43 | learning rate: 4.009E-05 | global batch size: 256 | lm loss: 2.231313E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.073 | TFLOPs: 31.12 | 7: iteration 90480/ 115203 | consumed samples: 23162880 | consumed tokens: 47437578240 | elapsed time per iteration (s): 0.43 | learning rate: 4.008E-05 | global batch size: 256 | lm loss: 2.251865E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.613 | TFLOPs: 30.88 | 7: iteration 90490/ 115203 | consumed samples: 23165440 | consumed tokens: 47442821120 | elapsed time per iteration (s): 0.44 | learning rate: 4.006E-05 | global batch size: 256 | lm loss: 2.224538E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.251 | TFLOPs: 30.60 | 7: iteration 90500/ 115203 | consumed samples: 23168000 | consumed tokens: 47448064000 | elapsed time per iteration (s): 0.45 | learning rate: 4.005E-05 | global batch size: 256 | lm loss: 2.247580E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.281 | TFLOPs: 29.76 | 7: iteration 90510/ 115203 | consumed samples: 23170560 | consumed tokens: 47453306880 | elapsed time per iteration (s): 0.44 | learning rate: 4.003E-05 | global batch size: 256 | lm loss: 2.211816E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.174 | TFLOPs: 30.55 | 7: iteration 90520/ 115203 | consumed samples: 23173120 | consumed tokens: 47458549760 | elapsed time per iteration (s): 0.44 | learning rate: 4.001E-05 | global batch size: 256 | lm loss: 2.232420E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.430 | TFLOPs: 30.35 | 7: iteration 90530/ 115203 | consumed samples: 23175680 | consumed tokens: 47463792640 | elapsed time per iteration (s): 0.44 | learning rate: 4.000E-05 | global batch size: 256 | lm loss: 2.202202E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.638 | TFLOPs: 30.41 | 7: iteration 90540/ 115203 | consumed samples: 23178240 | consumed tokens: 47469035520 | elapsed time per iteration (s): 0.43 | learning rate: 3.998E-05 | global batch size: 256 | lm loss: 2.240829E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.430 | TFLOPs: 31.03 | 7: iteration 90550/ 115203 | consumed samples: 23180800 | consumed tokens: 47474278400 | elapsed time per iteration (s): 0.43 | learning rate: 3.997E-05 | global batch size: 256 | lm loss: 2.229449E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.176 | TFLOPs: 31.07 | 7: iteration 90560/ 115203 | consumed samples: 23183360 | consumed tokens: 47479521280 | elapsed time per iteration (s): 0.45 | learning rate: 3.995E-05 | global batch size: 256 | lm loss: 2.259744E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.763 | TFLOPs: 29.95 | 7: iteration 90570/ 115203 | consumed samples: 23185920 | consumed tokens: 47484764160 | elapsed time per iteration (s): 0.43 | learning rate: 3.994E-05 | global batch size: 256 | lm loss: 2.262229E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.486 | TFLOPs: 31.09 | 7: iteration 90580/ 115203 | consumed samples: 23188480 | consumed tokens: 47490007040 | elapsed time per iteration (s): 0.42 | learning rate: 3.992E-05 | global batch size: 256 | lm loss: 2.202262E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.326 | TFLOPs: 31.76 | 7: iteration 90590/ 115203 | consumed samples: 23191040 | consumed tokens: 47495249920 | elapsed time per iteration (s): 0.44 | learning rate: 3.991E-05 | global batch size: 256 | lm loss: 2.250649E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.096 | TFLOPs: 30.80 | 7: iteration 90600/ 115203 | consumed samples: 23193600 | consumed tokens: 47500492800 | elapsed time per iteration (s): 0.44 | learning rate: 3.989E-05 | global batch size: 256 | lm loss: 2.236284E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.966 | TFLOPs: 30.64 | 7: iteration 90610/ 115203 | consumed samples: 23196160 | consumed tokens: 47505735680 | elapsed time per iteration (s): 0.44 | learning rate: 3.987E-05 | global batch size: 256 | lm loss: 2.224538E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.760 | TFLOPs: 30.79 | 7: iteration 90620/ 115203 | consumed samples: 23198720 | consumed tokens: 47510978560 | elapsed time per iteration (s): 0.44 | learning rate: 3.986E-05 | global batch size: 256 | lm loss: 2.203039E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.045 | TFLOPs: 30.80 | 7: iteration 90630/ 115203 | consumed samples: 23201280 | consumed tokens: 47516221440 | elapsed time per iteration (s): 0.43 | learning rate: 3.984E-05 | global batch size: 256 | lm loss: 2.248894E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.072 | TFLOPs: 30.91 | 7: iteration 90640/ 115203 | consumed samples: 23203840 | consumed tokens: 47521464320 | elapsed time per iteration (s): 0.44 | learning rate: 3.983E-05 | global batch size: 256 | lm loss: 2.224514E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.697 | TFLOPs: 30.78 | 7: iteration 90650/ 115203 | consumed samples: 23206400 | consumed tokens: 47526707200 | elapsed time per iteration (s): 0.44 | learning rate: 3.981E-05 | global batch size: 256 | lm loss: 2.207684E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.511 | TFLOPs: 30.77 | 7: iteration 90660/ 115203 | consumed samples: 23208960 | consumed tokens: 47531950080 | elapsed time per iteration (s): 0.43 | learning rate: 3.980E-05 | global batch size: 256 | lm loss: 2.227638E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.853 | TFLOPs: 30.90 | 7: iteration 90670/ 115203 | consumed samples: 23211520 | consumed tokens: 47537192960 | elapsed time per iteration (s): 0.43 | learning rate: 3.978E-05 | global batch size: 256 | lm loss: 2.242921E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.758 | TFLOPs: 30.89 | 7: iteration 90680/ 115203 | consumed samples: 23214080 | consumed tokens: 47542435840 | elapsed time per iteration (s): 0.44 | learning rate: 3.977E-05 | global batch size: 256 | lm loss: 2.226256E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.709 | TFLOPs: 30.52 | 7: iteration 90690/ 115203 | consumed samples: 23216640 | consumed tokens: 47547678720 | elapsed time per iteration (s): 0.44 | learning rate: 3.975E-05 | global batch size: 256 | lm loss: 2.230257E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.255 | TFLOPs: 30.71 | 7: iteration 90700/ 115203 | consumed samples: 23219200 | consumed tokens: 47552921600 | elapsed time per iteration (s): 0.44 | learning rate: 3.973E-05 | global batch size: 256 | lm loss: 2.220853E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.959 | TFLOPs: 30.27 | 7: iteration 90710/ 115203 | consumed samples: 23221760 | consumed tokens: 47558164480 | elapsed time per iteration (s): 0.45 | learning rate: 3.972E-05 | global batch size: 256 | lm loss: 2.229030E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.409 | TFLOPs: 29.82 | 7: iteration 90720/ 115203 | consumed samples: 23224320 | consumed tokens: 47563407360 | elapsed time per iteration (s): 0.43 | learning rate: 3.970E-05 | global batch size: 256 | lm loss: 2.267462E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.215 | TFLOPs: 30.97 | 7: iteration 90730/ 115203 | consumed samples: 23226880 | consumed tokens: 47568650240 | elapsed time per iteration (s): 0.43 | learning rate: 3.969E-05 | global batch size: 256 | lm loss: 2.228684E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.901 | TFLOPs: 31.06 | 7: iteration 90740/ 115203 | consumed samples: 23229440 | consumed tokens: 47573893120 | elapsed time per iteration (s): 0.44 | learning rate: 3.967E-05 | global batch size: 256 | lm loss: 2.214527E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.483 | TFLOPs: 30.77 | 7: iteration 90750/ 115203 | consumed samples: 23232000 | consumed tokens: 47579136000 | elapsed time per iteration (s): 0.45 | learning rate: 3.966E-05 | global batch size: 256 | lm loss: 2.219269E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.576 | TFLOPs: 29.67 | 7: iteration 90760/ 115203 | consumed samples: 23234560 | consumed tokens: 47584378880 | elapsed time per iteration (s): 0.45 | learning rate: 3.964E-05 | global batch size: 256 | lm loss: 2.229025E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.455 | TFLOPs: 29.72 | 7: iteration 90770/ 115203 | consumed samples: 23237120 | consumed tokens: 47589621760 | elapsed time per iteration (s): 0.44 | learning rate: 3.963E-05 | global batch size: 256 | lm loss: 2.241413E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.054 | TFLOPs: 30.70 | 7: iteration 90780/ 115203 | consumed samples: 23239680 | consumed tokens: 47594864640 | elapsed time per iteration (s): 0.43 | learning rate: 3.961E-05 | global batch size: 256 | lm loss: 2.238567E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.607 | TFLOPs: 30.99 | 7: iteration 90790/ 115203 | consumed samples: 23242240 | consumed tokens: 47600107520 | elapsed time per iteration (s): 0.42 | learning rate: 3.960E-05 | global batch size: 256 | lm loss: 2.222905E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.309 | TFLOPs: 31.81 | 7: iteration 90800/ 115203 | consumed samples: 23244800 | consumed tokens: 47605350400 | elapsed time per iteration (s): 0.43 | learning rate: 3.958E-05 | global batch size: 256 | lm loss: 2.253749E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.402 | TFLOPs: 31.29 | 7: iteration 90810/ 115203 | consumed samples: 23247360 | consumed tokens: 47610593280 | elapsed time per iteration (s): 0.44 | learning rate: 3.956E-05 | global batch size: 256 | lm loss: 2.230347E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.925 | TFLOPs: 30.69 | 7: iteration 90820/ 115203 | consumed samples: 23249920 | consumed tokens: 47615836160 | elapsed time per iteration (s): 0.45 | learning rate: 3.955E-05 | global batch size: 256 | lm loss: 2.234375E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.343 | TFLOPs: 29.72 | 7: iteration 90830/ 115203 | consumed samples: 23252480 | consumed tokens: 47621079040 | elapsed time per iteration (s): 0.43 | learning rate: 3.953E-05 | global batch size: 256 | lm loss: 2.192912E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.215 | TFLOPs: 31.28 | 7: iteration 90840/ 115203 | consumed samples: 23255040 | consumed tokens: 47626321920 | elapsed time per iteration (s): 0.44 | learning rate: 3.952E-05 | global batch size: 256 | lm loss: 2.202789E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.003 | TFLOPs: 30.27 | 7: iteration 90850/ 115203 | consumed samples: 23257600 | consumed tokens: 47631564800 | elapsed time per iteration (s): 0.44 | learning rate: 3.950E-05 | global batch size: 256 | lm loss: 2.249074E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.004 | TFLOPs: 30.59 | 7: iteration 90860/ 115203 | consumed samples: 23260160 | consumed tokens: 47636807680 | elapsed time per iteration (s): 0.45 | learning rate: 3.949E-05 | global batch size: 256 | lm loss: 2.206015E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.588 | TFLOPs: 29.99 | 7: iteration 90870/ 115203 | consumed samples: 23262720 | consumed tokens: 47642050560 | elapsed time per iteration (s): 0.44 | learning rate: 3.947E-05 | global batch size: 256 | lm loss: 2.225743E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.208 | TFLOPs: 30.81 | 7: iteration 90880/ 115203 | consumed samples: 23265280 | consumed tokens: 47647293440 | elapsed time per iteration (s): 0.43 | learning rate: 3.946E-05 | global batch size: 256 | lm loss: 2.245671E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.440 | TFLOPs: 31.19 | 7: iteration 90890/ 115203 | consumed samples: 23267840 | consumed tokens: 47652536320 | elapsed time per iteration (s): 0.43 | learning rate: 3.944E-05 | global batch size: 256 | lm loss: 2.269596E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.382 | TFLOPs: 31.24 | 7: iteration 90900/ 115203 | consumed samples: 23270400 | consumed tokens: 47657779200 | elapsed time per iteration (s): 0.44 | learning rate: 3.943E-05 | global batch size: 256 | lm loss: 2.217319E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.872 | TFLOPs: 30.58 | 7: iteration 90910/ 115203 | consumed samples: 23272960 | consumed tokens: 47663022080 | elapsed time per iteration (s): 0.43 | learning rate: 3.941E-05 | global batch size: 256 | lm loss: 2.244654E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.965 | TFLOPs: 31.16 | 7: iteration 90920/ 115203 | consumed samples: 23275520 | consumed tokens: 47668264960 | elapsed time per iteration (s): 0.44 | learning rate: 3.939E-05 | global batch size: 256 | lm loss: 2.235838E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.430 | TFLOPs: 30.35 | 7: iteration 90930/ 115203 | consumed samples: 23278080 | consumed tokens: 47673507840 | elapsed time per iteration (s): 0.43 | learning rate: 3.938E-05 | global batch size: 256 | lm loss: 2.235165E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.021 | TFLOPs: 31.53 | 7: iteration 90940/ 115203 | consumed samples: 23280640 | consumed tokens: 47678750720 | elapsed time per iteration (s): 0.43 | learning rate: 3.936E-05 | global batch size: 256 | lm loss: 2.216153E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.327 | TFLOPs: 31.34 | 7: iteration 90950/ 115203 | consumed samples: 23283200 | consumed tokens: 47683993600 | elapsed time per iteration (s): 0.43 | learning rate: 3.935E-05 | global batch size: 256 | lm loss: 2.222357E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.051 | TFLOPs: 31.48 | 7: iteration 90960/ 115203 | consumed samples: 23285760 | consumed tokens: 47689236480 | elapsed time per iteration (s): 0.44 | learning rate: 3.933E-05 | global batch size: 256 | lm loss: 2.248435E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.773 | TFLOPs: 30.68 | 7: iteration 90970/ 115203 | consumed samples: 23288320 | consumed tokens: 47694479360 | elapsed time per iteration (s): 0.60 | learning rate: 3.932E-05 | global batch size: 256 | lm loss: 2.244417E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 428.042 | TFLOPs: 22.46 | 7: iteration 90980/ 115203 | consumed samples: 23290880 | consumed tokens: 47699722240 | elapsed time per iteration (s): 0.45 | learning rate: 3.930E-05 | global batch size: 256 | lm loss: 2.226895E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.747 | TFLOPs: 29.63 | 7: iteration 90990/ 115203 | consumed samples: 23293440 | consumed tokens: 47704965120 | elapsed time per iteration (s): 0.44 | learning rate: 3.929E-05 | global batch size: 256 | lm loss: 2.215303E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.099 | TFLOPs: 30.70 | 7: iteration 91000/ 115203 | consumed samples: 23296000 | consumed tokens: 47710208000 | elapsed time per iteration (s): 0.43 | learning rate: 3.927E-05 | global batch size: 256 | lm loss: 2.243160E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.847 | TFLOPs: 31.00 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 91000 | lm loss value: 2.168272E+00 | lm loss PPL: 8.743163E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 91000 to checkpoints_221m 0: [2022-11-28 23:55:27,718] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step91000 is begin to save! 0: [2022-11-28 23:55:27,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_01-model_00-model_states.pt... 0: [2022-11-28 23:55:27,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_01-model_00-model_states.pt. 0: [2022-11-28 23:55:27,874] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_03-model_00-model_states.pt... 0: [2022-11-28 23:55:27,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_03-model_00-model_states.pt. 0: [2022-11-28 23:55:27,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_04-model_00-model_states.pt... 0: [2022-11-28 23:55:27,923] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_04-model_00-model_states.pt. 0: [2022-11-28 23:55:27,923] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_05-model_00-model_states.pt... 0: [2022-11-28 23:55:27,947] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_05-model_00-model_states.pt. 0: [2022-11-28 23:55:27,948] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_06-model_00-model_states.pt... 0: [2022-11-28 23:55:27,975] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_06-model_00-model_states.pt. 0: [2022-11-28 23:55:27,976] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_07-model_00-model_states.pt... 0: [2022-11-28 23:55:28,000] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_07-model_00-model_states.pt. 0: [2022-11-28 23:55:28,000] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_08-model_00-model_states.pt... 0: [2022-11-28 23:55:28,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_08-model_00-model_states.pt. 0: [2022-11-28 23:55:28,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_09-model_00-model_states.pt... 0: [2022-11-28 23:55:28,049] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_09-model_00-model_states.pt. 0: [2022-11-28 23:55:28,050] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_10-model_00-model_states.pt... 0: [2022-11-28 23:55:28,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_10-model_00-model_states.pt. 0: [2022-11-28 23:55:28,076] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_11-model_00-model_states.pt... 0: [2022-11-28 23:55:28,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_11-model_00-model_states.pt. 0: [2022-11-28 23:55:28,100] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_12-model_00-model_states.pt... 0: [2022-11-28 23:55:28,124] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_12-model_00-model_states.pt. 0: [2022-11-28 23:55:28,124] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_13-model_00-model_states.pt... 0: [2022-11-28 23:55:28,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_13-model_00-model_states.pt. 0: [2022-11-28 23:55:28,149] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_14-model_00-model_states.pt... 0: [2022-11-28 23:55:28,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_14-model_00-model_states.pt. 0: [2022-11-28 23:55:28,173] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_15-model_00-model_states.pt... 0: [2022-11-28 23:55:28,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_15-model_00-model_states.pt. 0: [2022-11-28 23:55:28,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_16-model_00-model_states.pt... 0: [2022-11-28 23:55:28,221] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_16-model_00-model_states.pt. 0: [2022-11-28 23:55:28,221] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_17-model_00-model_states.pt... 0: [2022-11-28 23:55:28,246] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_17-model_00-model_states.pt. 0: [2022-11-28 23:55:28,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_18-model_00-model_states.pt... 0: [2022-11-28 23:55:28,270] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_18-model_00-model_states.pt. 0: [2022-11-28 23:55:28,271] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_19-model_00-model_states.pt... 0: [2022-11-28 23:55:28,295] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_19-model_00-model_states.pt. 0: [2022-11-28 23:55:28,296] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_20-model_00-model_states.pt... 0: [2022-11-28 23:55:28,320] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_20-model_00-model_states.pt. 0: [2022-11-28 23:55:28,320] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/layer_22-model_00-model_states.pt... 0: [2022-11-28 23:55:28,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/layer_22-model_00-model_states.pt. 0: [2022-11-28 23:55:28,326] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step91000/mp_rank_00_model_states.pt 0: [2022-11-28 23:55:28,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/mp_rank_00_model_states.pt... 0: [2022-11-28 23:55:28,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/mp_rank_00_model_states.pt. 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-28 23:55:28,345] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 0: [2022-11-28 23:55:28,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,397] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,397] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,397] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,398] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,398] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,399] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,399] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,399] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,400] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,402] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,402] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,403] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,403] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,404] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,404] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,405] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,405] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,406] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,407] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,407] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 4: [2022-11-28 23:55:28,408] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-28 23:55:28,408] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-28 23:55:28,408] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,411] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,411] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,411] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,414] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,414] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,414] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,419] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,419] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-28 23:55:28,422] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-28 23:55:28,422] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 6: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-28 23:55:28,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 2: [2022-11-28 23:55:28,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,415] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2022-11-28 23:55:28,416] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-28 23:55:28,416] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,417] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 7: [2022-11-28 23:55:28,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 7: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 5: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-28 23:55:28,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-28 23:55:28,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-28 23:55:28,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: [2022-11-28 23:55:28,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step91000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-28 23:55:28,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step91000 is ready now! 0: successfully saved checkpoint at iteration 91000 to checkpoints_221m 7: time (ms) | save-checkpoint: 756.78 7: iteration 91010/ 115203 | consumed samples: 23298560 | consumed tokens: 47715450880 | elapsed time per iteration (s): 0.52 | learning rate: 3.926E-05 | global batch size: 256 | lm loss: 2.221764E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 491.015 | TFLOPs: 25.76 | 7: iteration 91020/ 115203 | consumed samples: 23301120 | consumed tokens: 47720693760 | elapsed time per iteration (s): 0.44 | learning rate: 3.924E-05 | global batch size: 256 | lm loss: 2.255381E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.174 | TFLOPs: 30.81 | 7: iteration 91030/ 115203 | consumed samples: 23303680 | consumed tokens: 47725936640 | elapsed time per iteration (s): 0.43 | learning rate: 3.923E-05 | global batch size: 256 | lm loss: 2.179618E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.333 | TFLOPs: 31.13 | 7: iteration 91040/ 115203 | consumed samples: 23306240 | consumed tokens: 47731179520 | elapsed time per iteration (s): 0.43 | learning rate: 3.921E-05 | global batch size: 256 | lm loss: 2.232343E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.977 | TFLOPs: 30.96 | 7: iteration 91050/ 115203 | consumed samples: 23308800 | consumed tokens: 47736422400 | elapsed time per iteration (s): 0.43 | learning rate: 3.920E-05 | global batch size: 256 | lm loss: 2.272174E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.494 | TFLOPs: 31.09 | 7: iteration 91060/ 115203 | consumed samples: 23311360 | consumed tokens: 47741665280 | elapsed time per iteration (s): 0.43 | learning rate: 3.918E-05 | global batch size: 256 | lm loss: 2.221319E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.867 | TFLOPs: 30.90 | 7: iteration 91070/ 115203 | consumed samples: 23313920 | consumed tokens: 47746908160 | elapsed time per iteration (s): 0.43 | learning rate: 3.916E-05 | global batch size: 256 | lm loss: 2.263273E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.564 | TFLOPs: 30.88 | 7: iteration 91080/ 115203 | consumed samples: 23316480 | consumed tokens: 47752151040 | elapsed time per iteration (s): 0.44 | learning rate: 3.915E-05 | global batch size: 256 | lm loss: 2.237965E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.820 | TFLOPs: 30.79 | 7: iteration 91090/ 115203 | consumed samples: 23319040 | consumed tokens: 47757393920 | elapsed time per iteration (s): 0.43 | learning rate: 3.913E-05 | global batch size: 256 | lm loss: 2.207112E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.868 | TFLOPs: 31.05 | 7: iteration 91100/ 115203 | consumed samples: 23321600 | consumed tokens: 47762636800 | elapsed time per iteration (s): 0.43 | learning rate: 3.912E-05 | global batch size: 256 | lm loss: 2.218570E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.611 | TFLOPs: 31.15 | 7: iteration 91110/ 115203 | consumed samples: 23324160 | consumed tokens: 47767879680 | elapsed time per iteration (s): 0.44 | learning rate: 3.910E-05 | global batch size: 256 | lm loss: 2.233610E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.013 | TFLOPs: 30.85 | 7: iteration 91120/ 115203 | consumed samples: 23326720 | consumed tokens: 47773122560 | elapsed time per iteration (s): 0.45 | learning rate: 3.909E-05 | global batch size: 256 | lm loss: 2.277638E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.122 | TFLOPs: 30.07 | 7: iteration 91130/ 115203 | consumed samples: 23329280 | consumed tokens: 47778365440 | elapsed time per iteration (s): 0.43 | learning rate: 3.907E-05 | global batch size: 256 | lm loss: 2.229976E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.721 | TFLOPs: 31.05 | 7: iteration 91140/ 115203 | consumed samples: 23331840 | consumed tokens: 47783608320 | elapsed time per iteration (s): 0.43 | learning rate: 3.906E-05 | global batch size: 256 | lm loss: 2.239940E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.749 | TFLOPs: 31.00 | 7: iteration 91150/ 115203 | consumed samples: 23334400 | consumed tokens: 47788851200 | elapsed time per iteration (s): 0.43 | learning rate: 3.904E-05 | global batch size: 256 | lm loss: 2.241997E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.832 | TFLOPs: 31.00 | 7: iteration 91160/ 115203 | consumed samples: 23336960 | consumed tokens: 47794094080 | elapsed time per iteration (s): 0.43 | learning rate: 3.903E-05 | global batch size: 256 | lm loss: 2.248492E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.217 | TFLOPs: 31.60 | 7: iteration 91170/ 115203 | consumed samples: 23339520 | consumed tokens: 47799336960 | elapsed time per iteration (s): 0.43 | learning rate: 3.901E-05 | global batch size: 256 | lm loss: 2.234215E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.849 | TFLOPs: 30.95 | 7: iteration 91180/ 115203 | consumed samples: 23342080 | consumed tokens: 47804579840 | elapsed time per iteration (s): 0.43 | learning rate: 3.900E-05 | global batch size: 256 | lm loss: 2.226108E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.582 | TFLOPs: 31.20 | 7: iteration 91190/ 115203 | consumed samples: 23344640 | consumed tokens: 47809822720 | elapsed time per iteration (s): 0.43 | learning rate: 3.898E-05 | global batch size: 256 | lm loss: 2.255538E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.034 | TFLOPs: 31.17 | 7: iteration 91200/ 115203 | consumed samples: 23347200 | consumed tokens: 47815065600 | elapsed time per iteration (s): 0.42 | learning rate: 3.897E-05 | global batch size: 256 | lm loss: 2.250220E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.255 | TFLOPs: 31.81 | 7: iteration 91210/ 115203 | consumed samples: 23349760 | consumed tokens: 47820308480 | elapsed time per iteration (s): 0.42 | learning rate: 3.895E-05 | global batch size: 256 | lm loss: 2.267894E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.687 | TFLOPs: 32.09 | 7: iteration 91220/ 115203 | consumed samples: 23352320 | consumed tokens: 47825551360 | elapsed time per iteration (s): 0.43 | learning rate: 3.894E-05 | global batch size: 256 | lm loss: 2.237324E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.877 | TFLOPs: 31.53 | 7: iteration 91230/ 115203 | consumed samples: 23354880 | consumed tokens: 47830794240 | elapsed time per iteration (s): 0.43 | learning rate: 3.892E-05 | global batch size: 256 | lm loss: 2.230207E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.250 | TFLOPs: 31.44 | 7: iteration 91240/ 115203 | consumed samples: 23357440 | consumed tokens: 47836037120 | elapsed time per iteration (s): 0.43 | learning rate: 3.891E-05 | global batch size: 256 | lm loss: 2.221096E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.803 | TFLOPs: 31.37 | 7: iteration 91250/ 115203 | consumed samples: 23360000 | consumed tokens: 47841280000 | elapsed time per iteration (s): 0.43 | learning rate: 3.889E-05 | global batch size: 256 | lm loss: 2.224626E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.477 | TFLOPs: 31.14 | 7: iteration 91260/ 115203 | consumed samples: 23362560 | consumed tokens: 47846522880 | elapsed time per iteration (s): 0.44 | learning rate: 3.888E-05 | global batch size: 256 | lm loss: 2.235781E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.311 | TFLOPs: 30.19 | 7: iteration 91270/ 115203 | consumed samples: 23365120 | consumed tokens: 47851765760 | elapsed time per iteration (s): 0.44 | learning rate: 3.886E-05 | global batch size: 256 | lm loss: 2.267260E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.995 | TFLOPs: 30.85 | 7: iteration 91280/ 115203 | consumed samples: 23367680 | consumed tokens: 47857008640 | elapsed time per iteration (s): 0.43 | learning rate: 3.885E-05 | global batch size: 256 | lm loss: 2.196893E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.955 | TFLOPs: 31.11 | 7: iteration 91290/ 115203 | consumed samples: 23370240 | consumed tokens: 47862251520 | elapsed time per iteration (s): 0.43 | learning rate: 3.883E-05 | global batch size: 256 | lm loss: 2.232583E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.032 | TFLOPs: 31.27 | 7: iteration 91300/ 115203 | consumed samples: 23372800 | consumed tokens: 47867494400 | elapsed time per iteration (s): 0.45 | learning rate: 3.881E-05 | global batch size: 256 | lm loss: 2.222223E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.991 | TFLOPs: 29.85 | 7: iteration 91310/ 115203 | consumed samples: 23375360 | consumed tokens: 47872737280 | elapsed time per iteration (s): 0.43 | learning rate: 3.880E-05 | global batch size: 256 | lm loss: 2.214686E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.785 | TFLOPs: 31.31 | 7: iteration 91320/ 115203 | consumed samples: 23377920 | consumed tokens: 47877980160 | elapsed time per iteration (s): 0.43 | learning rate: 3.878E-05 | global batch size: 256 | lm loss: 2.229378E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.098 | TFLOPs: 30.91 | 7: iteration 91330/ 115203 | consumed samples: 23380480 | consumed tokens: 47883223040 | elapsed time per iteration (s): 0.43 | learning rate: 3.877E-05 | global batch size: 256 | lm loss: 2.208027E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.456 | TFLOPs: 31.14 | 7: iteration 91340/ 115203 | consumed samples: 23383040 | consumed tokens: 47888465920 | elapsed time per iteration (s): 0.44 | learning rate: 3.875E-05 | global batch size: 256 | lm loss: 2.237160E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.084 | TFLOPs: 30.86 | 7: iteration 91350/ 115203 | consumed samples: 23385600 | consumed tokens: 47893708800 | elapsed time per iteration (s): 0.43 | learning rate: 3.874E-05 | global batch size: 256 | lm loss: 2.231240E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.766 | TFLOPs: 31.05 | 7: iteration 91360/ 115203 | consumed samples: 23388160 | consumed tokens: 47898951680 | elapsed time per iteration (s): 0.43 | learning rate: 3.872E-05 | global batch size: 256 | lm loss: 2.203495E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.720 | TFLOPs: 31.36 | 7: iteration 91370/ 115203 | consumed samples: 23390720 | consumed tokens: 47904194560 | elapsed time per iteration (s): 0.43 | learning rate: 3.871E-05 | global batch size: 256 | lm loss: 2.187747E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.514 | TFLOPs: 31.04 | 7: iteration 91380/ 115203 | consumed samples: 23393280 | consumed tokens: 47909437440 | elapsed time per iteration (s): 0.43 | learning rate: 3.869E-05 | global batch size: 256 | lm loss: 2.217163E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.000 | TFLOPs: 30.96 | 7: iteration 91390/ 115203 | consumed samples: 23395840 | consumed tokens: 47914680320 | elapsed time per iteration (s): 0.44 | learning rate: 3.868E-05 | global batch size: 256 | lm loss: 2.227805E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.091 | TFLOPs: 30.28 | 7: iteration 91400/ 115203 | consumed samples: 23398400 | consumed tokens: 47919923200 | elapsed time per iteration (s): 0.44 | learning rate: 3.866E-05 | global batch size: 256 | lm loss: 2.216669E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.141 | TFLOPs: 30.60 | 7: iteration 91410/ 115203 | consumed samples: 23400960 | consumed tokens: 47925166080 | elapsed time per iteration (s): 0.43 | learning rate: 3.865E-05 | global batch size: 256 | lm loss: 2.219235E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.173 | TFLOPs: 31.07 | 7: iteration 91420/ 115203 | consumed samples: 23403520 | consumed tokens: 47930408960 | elapsed time per iteration (s): 0.43 | learning rate: 3.863E-05 | global batch size: 256 | lm loss: 2.245433E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.634 | TFLOPs: 30.99 | 7: iteration 91430/ 115203 | consumed samples: 23406080 | consumed tokens: 47935651840 | elapsed time per iteration (s): 0.44 | learning rate: 3.862E-05 | global batch size: 256 | lm loss: 2.236409E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.888 | TFLOPs: 30.85 | 7: iteration 91440/ 115203 | consumed samples: 23408640 | consumed tokens: 47940894720 | elapsed time per iteration (s): 0.43 | learning rate: 3.860E-05 | global batch size: 256 | lm loss: 2.263643E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.692 | TFLOPs: 31.52 | 7: iteration 91450/ 115203 | consumed samples: 23411200 | consumed tokens: 47946137600 | elapsed time per iteration (s): 0.43 | learning rate: 3.859E-05 | global batch size: 256 | lm loss: 2.239865E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.235 | TFLOPs: 31.07 | 7: iteration 91460/ 115203 | consumed samples: 23413760 | consumed tokens: 47951380480 | elapsed time per iteration (s): 0.43 | learning rate: 3.857E-05 | global batch size: 256 | lm loss: 2.235537E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.386 | TFLOPs: 31.40 | 7: iteration 91470/ 115203 | consumed samples: 23416320 | consumed tokens: 47956623360 | elapsed time per iteration (s): 0.45 | learning rate: 3.856E-05 | global batch size: 256 | lm loss: 2.247067E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.708 | TFLOPs: 30.00 | 7: iteration 91480/ 115203 | consumed samples: 23418880 | consumed tokens: 47961866240 | elapsed time per iteration (s): 0.43 | learning rate: 3.854E-05 | global batch size: 256 | lm loss: 2.227180E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.744 | TFLOPs: 31.15 | 7: iteration 91490/ 115203 | consumed samples: 23421440 | consumed tokens: 47967109120 | elapsed time per iteration (s): 0.45 | learning rate: 3.853E-05 | global batch size: 256 | lm loss: 2.244955E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.298 | TFLOPs: 29.82 | 7: iteration 91500/ 115203 | consumed samples: 23424000 | consumed tokens: 47972352000 | elapsed time per iteration (s): 0.43 | learning rate: 3.851E-05 | global batch size: 256 | lm loss: 2.246134E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.421 | TFLOPs: 31.24 | 7: iteration 91510/ 115203 | consumed samples: 23426560 | consumed tokens: 47977594880 | elapsed time per iteration (s): 0.43 | learning rate: 3.850E-05 | global batch size: 256 | lm loss: 2.194553E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.761 | TFLOPs: 31.26 | 7: iteration 91520/ 115203 | consumed samples: 23429120 | consumed tokens: 47982837760 | elapsed time per iteration (s): 0.43 | learning rate: 3.848E-05 | global batch size: 256 | lm loss: 2.249078E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.869 | TFLOPs: 30.90 | 7: iteration 91530/ 115203 | consumed samples: 23431680 | consumed tokens: 47988080640 | elapsed time per iteration (s): 0.45 | learning rate: 3.847E-05 | global batch size: 256 | lm loss: 2.275609E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.844 | TFLOPs: 29.85 | 7: iteration 91540/ 115203 | consumed samples: 23434240 | consumed tokens: 47993323520 | elapsed time per iteration (s): 0.43 | learning rate: 3.845E-05 | global batch size: 256 | lm loss: 2.247196E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.538 | TFLOPs: 31.19 | 7: iteration 91550/ 115203 | consumed samples: 23436800 | consumed tokens: 47998566400 | elapsed time per iteration (s): 0.43 | learning rate: 3.844E-05 | global batch size: 256 | lm loss: 2.217945E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.836 | TFLOPs: 31.37 | 7: iteration 91560/ 115203 | consumed samples: 23439360 | consumed tokens: 48003809280 | elapsed time per iteration (s): 0.43 | learning rate: 3.842E-05 | global batch size: 256 | lm loss: 2.262602E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.075 | TFLOPs: 31.59 | 7: iteration 91570/ 115203 | consumed samples: 23441920 | consumed tokens: 48009052160 | elapsed time per iteration (s): 0.44 | learning rate: 3.841E-05 | global batch size: 256 | lm loss: 2.248145E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.496 | TFLOPs: 30.62 | 7: iteration 91580/ 115203 | consumed samples: 23444480 | consumed tokens: 48014295040 | elapsed time per iteration (s): 0.42 | learning rate: 3.839E-05 | global batch size: 256 | lm loss: 2.234390E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.262 | TFLOPs: 31.81 | 7: iteration 91590/ 115203 | consumed samples: 23447040 | consumed tokens: 48019537920 | elapsed time per iteration (s): 0.43 | learning rate: 3.838E-05 | global batch size: 256 | lm loss: 2.230391E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.822 | TFLOPs: 31.00 | 7: iteration 91600/ 115203 | consumed samples: 23449600 | consumed tokens: 48024780800 | elapsed time per iteration (s): 0.44 | learning rate: 3.836E-05 | global batch size: 256 | lm loss: 2.244263E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.345 | TFLOPs: 30.50 | 7: iteration 91610/ 115203 | consumed samples: 23452160 | consumed tokens: 48030023680 | elapsed time per iteration (s): 0.43 | learning rate: 3.835E-05 | global batch size: 256 | lm loss: 2.220381E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.473 | TFLOPs: 31.14 | 7: iteration 91620/ 115203 | consumed samples: 23454720 | consumed tokens: 48035266560 | elapsed time per iteration (s): 0.43 | learning rate: 3.833E-05 | global batch size: 256 | lm loss: 2.257901E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.782 | TFLOPs: 31.05 | 7: iteration 91630/ 115203 | consumed samples: 23457280 | consumed tokens: 48040509440 | elapsed time per iteration (s): 0.44 | learning rate: 3.832E-05 | global batch size: 256 | lm loss: 2.224973E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.485 | TFLOPs: 30.77 | 7: iteration 91640/ 115203 | consumed samples: 23459840 | consumed tokens: 48045752320 | elapsed time per iteration (s): 0.43 | learning rate: 3.830E-05 | global batch size: 256 | lm loss: 2.251290E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.173 | TFLOPs: 31.28 | 7: iteration 91650/ 115203 | consumed samples: 23462400 | consumed tokens: 48050995200 | elapsed time per iteration (s): 0.45 | learning rate: 3.829E-05 | global batch size: 256 | lm loss: 2.208110E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.922 | TFLOPs: 29.54 | 7: iteration 91660/ 115203 | consumed samples: 23464960 | consumed tokens: 48056238080 | elapsed time per iteration (s): 0.44 | learning rate: 3.827E-05 | global batch size: 256 | lm loss: 2.188505E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.471 | TFLOPs: 30.77 | 7: iteration 91670/ 115203 | consumed samples: 23467520 | consumed tokens: 48061480960 | elapsed time per iteration (s): 0.46 | learning rate: 3.826E-05 | global batch size: 256 | lm loss: 2.222610E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.565 | TFLOPs: 29.36 | 7: iteration 91680/ 115203 | consumed samples: 23470080 | consumed tokens: 48066723840 | elapsed time per iteration (s): 0.43 | learning rate: 3.824E-05 | global batch size: 256 | lm loss: 2.248422E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.546 | TFLOPs: 31.51 | 7: iteration 91690/ 115203 | consumed samples: 23472640 | consumed tokens: 48071966720 | elapsed time per iteration (s): 0.43 | learning rate: 3.823E-05 | global batch size: 256 | lm loss: 2.203575E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.894 | TFLOPs: 31.48 | 7: iteration 91700/ 115203 | consumed samples: 23475200 | consumed tokens: 48077209600 | elapsed time per iteration (s): 0.43 | learning rate: 3.821E-05 | global batch size: 256 | lm loss: 2.232191E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.503 | TFLOPs: 31.14 | 7: iteration 91710/ 115203 | consumed samples: 23477760 | consumed tokens: 48082452480 | elapsed time per iteration (s): 0.43 | learning rate: 3.820E-05 | global batch size: 256 | lm loss: 2.228579E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.699 | TFLOPs: 31.10 | 7: iteration 91720/ 115203 | consumed samples: 23480320 | consumed tokens: 48087695360 | elapsed time per iteration (s): 0.43 | learning rate: 3.818E-05 | global batch size: 256 | lm loss: 2.235823E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.602 | TFLOPs: 31.25 | 7: iteration 91730/ 115203 | consumed samples: 23482880 | consumed tokens: 48092938240 | elapsed time per iteration (s): 0.43 | learning rate: 3.817E-05 | global batch size: 256 | lm loss: 2.220372E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.266 | TFLOPs: 31.13 | 7: iteration 91740/ 115203 | consumed samples: 23485440 | consumed tokens: 48098181120 | elapsed time per iteration (s): 0.43 | learning rate: 3.815E-05 | global batch size: 256 | lm loss: 2.269787E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.846 | TFLOPs: 31.58 | 7: iteration 91750/ 115203 | consumed samples: 23488000 | consumed tokens: 48103424000 | elapsed time per iteration (s): 0.43 | learning rate: 3.814E-05 | global batch size: 256 | lm loss: 2.206132E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.358 | TFLOPs: 31.55 | 7: iteration 91760/ 115203 | consumed samples: 23490560 | consumed tokens: 48108666880 | elapsed time per iteration (s): 0.43 | learning rate: 3.812E-05 | global batch size: 256 | lm loss: 2.228664E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.320 | TFLOPs: 30.97 | 7: iteration 91770/ 115203 | consumed samples: 23493120 | consumed tokens: 48113909760 | elapsed time per iteration (s): 0.42 | learning rate: 3.811E-05 | global batch size: 256 | lm loss: 2.230962E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.872 | TFLOPs: 31.68 | 7: iteration 91780/ 115203 | consumed samples: 23495680 | consumed tokens: 48119152640 | elapsed time per iteration (s): 0.43 | learning rate: 3.809E-05 | global batch size: 256 | lm loss: 2.210026E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.631 | TFLOPs: 31.36 | 7: iteration 91790/ 115203 | consumed samples: 23498240 | consumed tokens: 48124395520 | elapsed time per iteration (s): 0.44 | learning rate: 3.808E-05 | global batch size: 256 | lm loss: 2.219607E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.877 | TFLOPs: 30.43 | 7: iteration 91800/ 115203 | consumed samples: 23500800 | consumed tokens: 48129638400 | elapsed time per iteration (s): 0.43 | learning rate: 3.806E-05 | global batch size: 256 | lm loss: 2.225974E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.300 | TFLOPs: 31.44 | 7: iteration 91810/ 115203 | consumed samples: 23503360 | consumed tokens: 48134881280 | elapsed time per iteration (s): 0.43 | learning rate: 3.805E-05 | global batch size: 256 | lm loss: 2.213265E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.305 | TFLOPs: 31.55 | 7: iteration 91820/ 115203 | consumed samples: 23505920 | consumed tokens: 48140124160 | elapsed time per iteration (s): 0.44 | learning rate: 3.803E-05 | global batch size: 256 | lm loss: 2.222359E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.959 | TFLOPs: 30.22 | 7: iteration 91830/ 115203 | consumed samples: 23508480 | consumed tokens: 48145367040 | elapsed time per iteration (s): 0.43 | learning rate: 3.802E-05 | global batch size: 256 | lm loss: 2.197445E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.818 | TFLOPs: 31.26 | 7: iteration 91840/ 115203 | consumed samples: 23511040 | consumed tokens: 48150609920 | elapsed time per iteration (s): 0.44 | learning rate: 3.800E-05 | global batch size: 256 | lm loss: 2.220263E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.837 | TFLOPs: 30.74 | 7: iteration 91850/ 115203 | consumed samples: 23513600 | consumed tokens: 48155852800 | elapsed time per iteration (s): 0.43 | learning rate: 3.799E-05 | global batch size: 256 | lm loss: 2.218589E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.970 | TFLOPs: 30.95 | 7: iteration 91860/ 115203 | consumed samples: 23516160 | consumed tokens: 48161095680 | elapsed time per iteration (s): 0.44 | learning rate: 3.797E-05 | global batch size: 256 | lm loss: 2.228881E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.358 | TFLOPs: 30.45 | 7: iteration 91870/ 115203 | consumed samples: 23518720 | consumed tokens: 48166338560 | elapsed time per iteration (s): 0.45 | learning rate: 3.796E-05 | global batch size: 256 | lm loss: 2.218940E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.133 | TFLOPs: 30.07 | 7: iteration 91880/ 115203 | consumed samples: 23521280 | consumed tokens: 48171581440 | elapsed time per iteration (s): 0.43 | learning rate: 3.794E-05 | global batch size: 256 | lm loss: 2.246697E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.406 | TFLOPs: 31.03 | 7: iteration 91890/ 115203 | consumed samples: 23523840 | consumed tokens: 48176824320 | elapsed time per iteration (s): 0.44 | learning rate: 3.793E-05 | global batch size: 256 | lm loss: 2.247622E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.602 | TFLOPs: 30.62 | 7: iteration 91900/ 115203 | consumed samples: 23526400 | consumed tokens: 48182067200 | elapsed time per iteration (s): 0.43 | learning rate: 3.791E-05 | global batch size: 256 | lm loss: 2.244738E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.837 | TFLOPs: 31.05 | 7: iteration 91910/ 115203 | consumed samples: 23528960 | consumed tokens: 48187310080 | elapsed time per iteration (s): 0.44 | learning rate: 3.790E-05 | global batch size: 256 | lm loss: 2.234784E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.914 | TFLOPs: 30.69 | 7: iteration 91920/ 115203 | consumed samples: 23531520 | consumed tokens: 48192552960 | elapsed time per iteration (s): 0.42 | learning rate: 3.788E-05 | global batch size: 256 | lm loss: 2.245436E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.226 | TFLOPs: 31.76 | 7: iteration 91930/ 115203 | consumed samples: 23534080 | consumed tokens: 48197795840 | elapsed time per iteration (s): 0.43 | learning rate: 3.787E-05 | global batch size: 256 | lm loss: 2.243484E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.663 | TFLOPs: 31.36 | 7: iteration 91940/ 115203 | consumed samples: 23536640 | consumed tokens: 48203038720 | elapsed time per iteration (s): 0.43 | learning rate: 3.785E-05 | global batch size: 256 | lm loss: 2.200109E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.855 | TFLOPs: 31.16 | 7: iteration 91950/ 115203 | consumed samples: 23539200 | consumed tokens: 48208281600 | elapsed time per iteration (s): 0.43 | learning rate: 3.784E-05 | global batch size: 256 | lm loss: 2.249726E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.296 | TFLOPs: 31.29 | 7: iteration 91960/ 115203 | consumed samples: 23541760 | consumed tokens: 48213524480 | elapsed time per iteration (s): 0.43 | learning rate: 3.783E-05 | global batch size: 256 | lm loss: 2.194728E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.887 | TFLOPs: 31.27 | 7: iteration 91970/ 115203 | consumed samples: 23544320 | consumed tokens: 48218767360 | elapsed time per iteration (s): 0.44 | learning rate: 3.781E-05 | global batch size: 256 | lm loss: 2.212700E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.804 | TFLOPs: 30.58 | 7: iteration 91980/ 115203 | consumed samples: 23546880 | consumed tokens: 48224010240 | elapsed time per iteration (s): 0.44 | learning rate: 3.780E-05 | global batch size: 256 | lm loss: 2.237329E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.187 | TFLOPs: 30.34 | 7: iteration 91990/ 115203 | consumed samples: 23549440 | consumed tokens: 48229253120 | elapsed time per iteration (s): 0.48 | learning rate: 3.778E-05 | global batch size: 256 | lm loss: 2.215145E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 533.842 | TFLOPs: 28.01 | 0: [2022-11-29 00:02:44,167] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=0, lr=[3.776612403864962e-05, 3.776612403864962e-05, 3.776612403864962e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 92000/ 115203 | consumed samples: 23552000 | consumed tokens: 48234496000 | elapsed time per iteration (s): 0.61 | learning rate: 3.777E-05 | global batch size: 256 | lm loss: 2.246514E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 422.724 | TFLOPs: 22.18 | 0: steps: 92000 loss: 2.2587 iter time (s): 0.436 samples/sec: 587.645 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 92000 | lm loss value: 2.066104E+00 | lm loss PPL: 7.894009E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 92000 to checkpoints_221m 0: [2022-11-29 00:02:44,360] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step92000 is begin to save! 0: [2022-11-29 00:02:44,365] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:02:44,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:02:44,477] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:02:44,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:02:44,501] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:02:44,526] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:02:44,526] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:02:44,549] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:02:44,550] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:02:44,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:02:44,574] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:02:44,597] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:02:44,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:02:44,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:02:44,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:02:44,644] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:02:44,645] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:02:44,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:02:44,669] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:02:44,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:02:44,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:02:44,714] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:02:44,715] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:02:44,737] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:02:44,737] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:02:44,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:02:44,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:02:44,783] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:02:44,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:02:44,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:02:44,808] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:02:44,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:02:44,831] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:02:44,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:02:44,854] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:02:44,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:02:44,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:02:44,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:02:44,902] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:02:44,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:02:44,907] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step92000/mp_rank_00_model_states.pt 0: [2022-11-29 00:02:44,907] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:02:44,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,030] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:02:45,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,082] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,082] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,083] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,083] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,083] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,087] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,087] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,092] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,092] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,092] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,085] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,086] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,086] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,088] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,089] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,089] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,089] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,090] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,090] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,090] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,093] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 3: [2022-11-29 00:02:45,094] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:02:45,094] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 00:02:45,094] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,096] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 2: [2022-11-29 00:02:45,098] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:02:45,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 00:02:45,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 4: [2022-11-29 00:02:45,102] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:02:45,102] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:02:45,103] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 6: [2022-11-29 00:02:45,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:02:45,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:02:45,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,128] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,128] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:02:45,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: [2022-11-29 00:02:45,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:02:45,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 1: [2022-11-29 00:02:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,193] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,193] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,193] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:02:45,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,198] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step92000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 5: [2022-11-29 00:02:45,198] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step92000 is ready now! 0: successfully saved checkpoint at iteration 92000 to checkpoints_221m 7: time (ms) | save-checkpoint: 852.22 7: iteration 92010/ 115203 | consumed samples: 23554560 | consumed tokens: 48239738880 | elapsed time per iteration (s): 0.55 | learning rate: 3.775E-05 | global batch size: 256 | lm loss: 2.217345E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 462.959 | TFLOPs: 24.29 | 7: iteration 92020/ 115203 | consumed samples: 23557120 | consumed tokens: 48244981760 | elapsed time per iteration (s): 0.44 | learning rate: 3.774E-05 | global batch size: 256 | lm loss: 2.206949E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.016 | TFLOPs: 30.59 | 7: iteration 92030/ 115203 | consumed samples: 23559680 | consumed tokens: 48250224640 | elapsed time per iteration (s): 0.43 | learning rate: 3.772E-05 | global batch size: 256 | lm loss: 2.253425E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.030 | TFLOPs: 31.22 | 7: iteration 92040/ 115203 | consumed samples: 23562240 | consumed tokens: 48255467520 | elapsed time per iteration (s): 0.43 | learning rate: 3.771E-05 | global batch size: 256 | lm loss: 2.198160E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.827 | TFLOPs: 31.37 | 7: iteration 92050/ 115203 | consumed samples: 23564800 | consumed tokens: 48260710400 | elapsed time per iteration (s): 0.44 | learning rate: 3.769E-05 | global batch size: 256 | lm loss: 2.214822E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.645 | TFLOPs: 30.47 | 7: iteration 92060/ 115203 | consumed samples: 23567360 | consumed tokens: 48265953280 | elapsed time per iteration (s): 0.43 | learning rate: 3.768E-05 | global batch size: 256 | lm loss: 2.222009E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.949 | TFLOPs: 31.43 | 7: iteration 92070/ 115203 | consumed samples: 23569920 | consumed tokens: 48271196160 | elapsed time per iteration (s): 0.43 | learning rate: 3.766E-05 | global batch size: 256 | lm loss: 2.242818E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.558 | TFLOPs: 31.51 | 7: iteration 92080/ 115203 | consumed samples: 23572480 | consumed tokens: 48276439040 | elapsed time per iteration (s): 0.45 | learning rate: 3.765E-05 | global batch size: 256 | lm loss: 2.202898E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.244 | TFLOPs: 30.08 | 7: iteration 92090/ 115203 | consumed samples: 23575040 | consumed tokens: 48281681920 | elapsed time per iteration (s): 0.44 | learning rate: 3.763E-05 | global batch size: 256 | lm loss: 2.233882E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.193 | TFLOPs: 30.28 | 7: iteration 92100/ 115203 | consumed samples: 23577600 | consumed tokens: 48286924800 | elapsed time per iteration (s): 0.43 | learning rate: 3.762E-05 | global batch size: 256 | lm loss: 2.230866E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.476 | TFLOPs: 30.93 | 7: iteration 92110/ 115203 | consumed samples: 23580160 | consumed tokens: 48292167680 | elapsed time per iteration (s): 0.44 | learning rate: 3.760E-05 | global batch size: 256 | lm loss: 2.256500E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.184 | TFLOPs: 30.86 | 7: iteration 92120/ 115203 | consumed samples: 23582720 | consumed tokens: 48297410560 | elapsed time per iteration (s): 0.43 | learning rate: 3.759E-05 | global batch size: 256 | lm loss: 2.263300E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.999 | TFLOPs: 31.43 | 7: iteration 92130/ 115203 | consumed samples: 23585280 | consumed tokens: 48302653440 | elapsed time per iteration (s): 0.43 | learning rate: 3.757E-05 | global batch size: 256 | lm loss: 2.189815E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.702 | TFLOPs: 31.10 | 7: iteration 92140/ 115203 | consumed samples: 23587840 | consumed tokens: 48307896320 | elapsed time per iteration (s): 0.43 | learning rate: 3.756E-05 | global batch size: 256 | lm loss: 2.236824E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.481 | TFLOPs: 31.03 | 7: iteration 92150/ 115203 | consumed samples: 23590400 | consumed tokens: 48313139200 | elapsed time per iteration (s): 0.43 | learning rate: 3.754E-05 | global batch size: 256 | lm loss: 2.217912E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.767 | TFLOPs: 31.57 | 7: iteration 92160/ 115203 | consumed samples: 23592960 | consumed tokens: 48318382080 | elapsed time per iteration (s): 0.43 | learning rate: 3.753E-05 | global batch size: 256 | lm loss: 2.252181E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.766 | TFLOPs: 30.94 | 7: iteration 92170/ 115203 | consumed samples: 23595520 | consumed tokens: 48323624960 | elapsed time per iteration (s): 0.46 | learning rate: 3.752E-05 | global batch size: 256 | lm loss: 2.248027E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.022 | TFLOPs: 29.07 | 7: iteration 92180/ 115203 | consumed samples: 23598080 | consumed tokens: 48328867840 | elapsed time per iteration (s): 0.44 | learning rate: 3.750E-05 | global batch size: 256 | lm loss: 2.209318E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.427 | TFLOPs: 30.45 | 7: iteration 92190/ 115203 | consumed samples: 23600640 | consumed tokens: 48334110720 | elapsed time per iteration (s): 0.43 | learning rate: 3.749E-05 | global batch size: 256 | lm loss: 2.202683E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.776 | TFLOPs: 31.21 | 7: iteration 92200/ 115203 | consumed samples: 23603200 | consumed tokens: 48339353600 | elapsed time per iteration (s): 0.44 | learning rate: 3.747E-05 | global batch size: 256 | lm loss: 2.247538E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.439 | TFLOPs: 30.77 | 7: iteration 92210/ 115203 | consumed samples: 23605760 | consumed tokens: 48344596480 | elapsed time per iteration (s): 0.43 | learning rate: 3.746E-05 | global batch size: 256 | lm loss: 2.231956E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.300 | TFLOPs: 31.02 | 7: iteration 92220/ 115203 | consumed samples: 23608320 | consumed tokens: 48349839360 | elapsed time per iteration (s): 0.44 | learning rate: 3.744E-05 | global batch size: 256 | lm loss: 2.205850E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.945 | TFLOPs: 30.80 | 7: iteration 92230/ 115203 | consumed samples: 23610880 | consumed tokens: 48355082240 | elapsed time per iteration (s): 0.43 | learning rate: 3.743E-05 | global batch size: 256 | lm loss: 2.253355E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.086 | TFLOPs: 31.12 | 7: iteration 92240/ 115203 | consumed samples: 23613440 | consumed tokens: 48360325120 | elapsed time per iteration (s): 0.43 | learning rate: 3.741E-05 | global batch size: 256 | lm loss: 2.230462E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.742 | TFLOPs: 30.94 | 7: iteration 92250/ 115203 | consumed samples: 23616000 | consumed tokens: 48365568000 | elapsed time per iteration (s): 0.43 | learning rate: 3.740E-05 | global batch size: 256 | lm loss: 2.217796E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.890 | TFLOPs: 31.58 | 7: iteration 92260/ 115203 | consumed samples: 23618560 | consumed tokens: 48370810880 | elapsed time per iteration (s): 0.43 | learning rate: 3.738E-05 | global batch size: 256 | lm loss: 2.283284E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.817 | TFLOPs: 31.42 | 7: iteration 92270/ 115203 | consumed samples: 23621120 | consumed tokens: 48376053760 | elapsed time per iteration (s): 0.42 | learning rate: 3.737E-05 | global batch size: 256 | lm loss: 2.256833E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.573 | TFLOPs: 31.77 | 7: iteration 92280/ 115203 | consumed samples: 23623680 | consumed tokens: 48381296640 | elapsed time per iteration (s): 0.43 | learning rate: 3.735E-05 | global batch size: 256 | lm loss: 2.243335E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.534 | TFLOPs: 31.04 | 7: iteration 92290/ 115203 | consumed samples: 23626240 | consumed tokens: 48386539520 | elapsed time per iteration (s): 0.43 | learning rate: 3.734E-05 | global batch size: 256 | lm loss: 2.245447E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.229 | TFLOPs: 30.97 | 7: iteration 92300/ 115203 | consumed samples: 23628800 | consumed tokens: 48391782400 | elapsed time per iteration (s): 0.44 | learning rate: 3.732E-05 | global batch size: 256 | lm loss: 2.236898E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.521 | TFLOPs: 30.56 | 7: iteration 92310/ 115203 | consumed samples: 23631360 | consumed tokens: 48397025280 | elapsed time per iteration (s): 0.43 | learning rate: 3.731E-05 | global batch size: 256 | lm loss: 2.257396E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.113 | TFLOPs: 31.07 | 7: iteration 92320/ 115203 | consumed samples: 23633920 | consumed tokens: 48402268160 | elapsed time per iteration (s): 0.44 | learning rate: 3.730E-05 | global batch size: 256 | lm loss: 2.209002E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.440 | TFLOPs: 30.87 | 7: iteration 92330/ 115203 | consumed samples: 23636480 | consumed tokens: 48407511040 | elapsed time per iteration (s): 0.44 | learning rate: 3.728E-05 | global batch size: 256 | lm loss: 2.202577E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.378 | TFLOPs: 30.56 | 7: iteration 92340/ 115203 | consumed samples: 23639040 | consumed tokens: 48412753920 | elapsed time per iteration (s): 0.44 | learning rate: 3.727E-05 | global batch size: 256 | lm loss: 2.224044E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.098 | TFLOPs: 30.65 | 7: iteration 92350/ 115203 | consumed samples: 23641600 | consumed tokens: 48417996800 | elapsed time per iteration (s): 0.43 | learning rate: 3.725E-05 | global batch size: 256 | lm loss: 2.227943E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.039 | TFLOPs: 31.01 | 7: iteration 92360/ 115203 | consumed samples: 23644160 | consumed tokens: 48423239680 | elapsed time per iteration (s): 0.45 | learning rate: 3.724E-05 | global batch size: 256 | lm loss: 2.213006E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.998 | TFLOPs: 30.17 | 7: iteration 92370/ 115203 | consumed samples: 23646720 | consumed tokens: 48428482560 | elapsed time per iteration (s): 0.43 | learning rate: 3.722E-05 | global batch size: 256 | lm loss: 2.230830E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.712 | TFLOPs: 30.89 | 7: iteration 92380/ 115203 | consumed samples: 23649280 | consumed tokens: 48433725440 | elapsed time per iteration (s): 0.43 | learning rate: 3.721E-05 | global batch size: 256 | lm loss: 2.226517E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.663 | TFLOPs: 31.10 | 7: iteration 92390/ 115203 | consumed samples: 23651840 | consumed tokens: 48438968320 | elapsed time per iteration (s): 0.43 | learning rate: 3.719E-05 | global batch size: 256 | lm loss: 2.223777E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.445 | TFLOPs: 31.24 | 7: iteration 92400/ 115203 | consumed samples: 23654400 | consumed tokens: 48444211200 | elapsed time per iteration (s): 0.43 | learning rate: 3.718E-05 | global batch size: 256 | lm loss: 2.215551E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.314 | TFLOPs: 31.18 | 7: iteration 92410/ 115203 | consumed samples: 23656960 | consumed tokens: 48449454080 | elapsed time per iteration (s): 0.44 | learning rate: 3.716E-05 | global batch size: 256 | lm loss: 2.220675E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.938 | TFLOPs: 30.64 | 7: iteration 92420/ 115203 | consumed samples: 23659520 | consumed tokens: 48454696960 | elapsed time per iteration (s): 0.43 | learning rate: 3.715E-05 | global batch size: 256 | lm loss: 2.206839E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.346 | TFLOPs: 31.08 | 7: iteration 92430/ 115203 | consumed samples: 23662080 | consumed tokens: 48459939840 | elapsed time per iteration (s): 0.43 | learning rate: 3.714E-05 | global batch size: 256 | lm loss: 2.262885E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.878 | TFLOPs: 31.53 | 7: iteration 92440/ 115203 | consumed samples: 23664640 | consumed tokens: 48465182720 | elapsed time per iteration (s): 0.44 | learning rate: 3.712E-05 | global batch size: 256 | lm loss: 2.229986E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.465 | TFLOPs: 30.77 | 7: iteration 92450/ 115203 | consumed samples: 23667200 | consumed tokens: 48470425600 | elapsed time per iteration (s): 0.44 | learning rate: 3.711E-05 | global batch size: 256 | lm loss: 2.228360E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.275 | TFLOPs: 30.50 | 7: iteration 92460/ 115203 | consumed samples: 23669760 | consumed tokens: 48475668480 | elapsed time per iteration (s): 0.43 | learning rate: 3.709E-05 | global batch size: 256 | lm loss: 2.231361E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.875 | TFLOPs: 31.32 | 7: iteration 92470/ 115203 | consumed samples: 23672320 | consumed tokens: 48480911360 | elapsed time per iteration (s): 0.43 | learning rate: 3.708E-05 | global batch size: 256 | lm loss: 2.258109E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.525 | TFLOPs: 31.04 | 7: iteration 92480/ 115203 | consumed samples: 23674880 | consumed tokens: 48486154240 | elapsed time per iteration (s): 0.43 | learning rate: 3.706E-05 | global batch size: 256 | lm loss: 2.244666E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.931 | TFLOPs: 31.48 | 7: iteration 92490/ 115203 | consumed samples: 23677440 | consumed tokens: 48491397120 | elapsed time per iteration (s): 0.45 | learning rate: 3.705E-05 | global batch size: 256 | lm loss: 2.212604E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.850 | TFLOPs: 30.11 | 7: iteration 92500/ 115203 | consumed samples: 23680000 | consumed tokens: 48496640000 | elapsed time per iteration (s): 0.43 | learning rate: 3.703E-05 | global batch size: 256 | lm loss: 2.243422E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.645 | TFLOPs: 30.99 | 7: iteration 92510/ 115203 | consumed samples: 23682560 | consumed tokens: 48501882880 | elapsed time per iteration (s): 0.44 | learning rate: 3.702E-05 | global batch size: 256 | lm loss: 2.240677E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.917 | TFLOPs: 30.74 | 7: iteration 92520/ 115203 | consumed samples: 23685120 | consumed tokens: 48507125760 | elapsed time per iteration (s): 0.43 | learning rate: 3.700E-05 | global batch size: 256 | lm loss: 2.220214E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.823 | TFLOPs: 31.26 | 7: iteration 92530/ 115203 | consumed samples: 23687680 | consumed tokens: 48512368640 | elapsed time per iteration (s): 0.43 | learning rate: 3.699E-05 | global batch size: 256 | lm loss: 2.226917E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.500 | TFLOPs: 30.98 | 7: iteration 92540/ 115203 | consumed samples: 23690240 | consumed tokens: 48517611520 | elapsed time per iteration (s): 0.43 | learning rate: 3.698E-05 | global batch size: 256 | lm loss: 2.233059E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.806 | TFLOPs: 31.42 | 7: iteration 92550/ 115203 | consumed samples: 23692800 | consumed tokens: 48522854400 | elapsed time per iteration (s): 0.43 | learning rate: 3.696E-05 | global batch size: 256 | lm loss: 2.229338E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.911 | TFLOPs: 31.21 | 7: iteration 92560/ 115203 | consumed samples: 23695360 | consumed tokens: 48528097280 | elapsed time per iteration (s): 0.44 | learning rate: 3.695E-05 | global batch size: 256 | lm loss: 2.252481E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.134 | TFLOPs: 30.39 | 7: iteration 92570/ 115203 | consumed samples: 23697920 | consumed tokens: 48533340160 | elapsed time per iteration (s): 0.45 | learning rate: 3.693E-05 | global batch size: 256 | lm loss: 2.212797E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.214 | TFLOPs: 30.13 | 7: iteration 92580/ 115203 | consumed samples: 23700480 | consumed tokens: 48538583040 | elapsed time per iteration (s): 0.43 | learning rate: 3.692E-05 | global batch size: 256 | lm loss: 2.185035E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.335 | TFLOPs: 31.08 | 7: iteration 92590/ 115203 | consumed samples: 23703040 | consumed tokens: 48543825920 | elapsed time per iteration (s): 0.43 | learning rate: 3.690E-05 | global batch size: 256 | lm loss: 2.235798E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.537 | TFLOPs: 30.98 | 7: iteration 92600/ 115203 | consumed samples: 23705600 | consumed tokens: 48549068800 | elapsed time per iteration (s): 0.44 | learning rate: 3.689E-05 | global batch size: 256 | lm loss: 2.243163E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.083 | TFLOPs: 30.38 | 7: iteration 92610/ 115203 | consumed samples: 23708160 | consumed tokens: 48554311680 | elapsed time per iteration (s): 0.43 | learning rate: 3.687E-05 | global batch size: 256 | lm loss: 2.241058E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.416 | TFLOPs: 31.03 | 7: iteration 92620/ 115203 | consumed samples: 23710720 | consumed tokens: 48559554560 | elapsed time per iteration (s): 0.44 | learning rate: 3.686E-05 | global batch size: 256 | lm loss: 2.233016E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.698 | TFLOPs: 30.78 | 7: iteration 92630/ 115203 | consumed samples: 23713280 | consumed tokens: 48564797440 | elapsed time per iteration (s): 0.44 | learning rate: 3.685E-05 | global batch size: 256 | lm loss: 2.240622E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.061 | TFLOPs: 30.75 | 7: iteration 92640/ 115203 | consumed samples: 23715840 | consumed tokens: 48570040320 | elapsed time per iteration (s): 0.43 | learning rate: 3.683E-05 | global batch size: 256 | lm loss: 2.179309E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.763 | TFLOPs: 31.05 | 7: iteration 92650/ 115203 | consumed samples: 23718400 | consumed tokens: 48575283200 | elapsed time per iteration (s): 0.44 | learning rate: 3.682E-05 | global batch size: 256 | lm loss: 2.217014E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.586 | TFLOPs: 30.67 | 7: iteration 92660/ 115203 | consumed samples: 23720960 | consumed tokens: 48580526080 | elapsed time per iteration (s): 0.43 | learning rate: 3.680E-05 | global batch size: 256 | lm loss: 2.247453E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.921 | TFLOPs: 31.16 | 7: iteration 92670/ 115203 | consumed samples: 23723520 | consumed tokens: 48585768960 | elapsed time per iteration (s): 0.43 | learning rate: 3.679E-05 | global batch size: 256 | lm loss: 2.248989E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.318 | TFLOPs: 31.13 | 7: iteration 92680/ 115203 | consumed samples: 23726080 | consumed tokens: 48591011840 | elapsed time per iteration (s): 0.45 | learning rate: 3.677E-05 | global batch size: 256 | lm loss: 2.213172E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.752 | TFLOPs: 29.58 | 7: iteration 92690/ 115203 | consumed samples: 23728640 | consumed tokens: 48596254720 | elapsed time per iteration (s): 0.42 | learning rate: 3.676E-05 | global batch size: 256 | lm loss: 2.221491E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.534 | TFLOPs: 31.67 | 7: iteration 92700/ 115203 | consumed samples: 23731200 | consumed tokens: 48601497600 | elapsed time per iteration (s): 0.46 | learning rate: 3.674E-05 | global batch size: 256 | lm loss: 2.241553E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 551.389 | TFLOPs: 28.93 | 7: iteration 92710/ 115203 | consumed samples: 23733760 | consumed tokens: 48606740480 | elapsed time per iteration (s): 0.43 | learning rate: 3.673E-05 | global batch size: 256 | lm loss: 2.244995E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.176 | TFLOPs: 31.60 | 7: iteration 92720/ 115203 | consumed samples: 23736320 | consumed tokens: 48611983360 | elapsed time per iteration (s): 0.44 | learning rate: 3.672E-05 | global batch size: 256 | lm loss: 2.198888E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.943 | TFLOPs: 30.48 | 7: iteration 92730/ 115203 | consumed samples: 23738880 | consumed tokens: 48617226240 | elapsed time per iteration (s): 0.43 | learning rate: 3.670E-05 | global batch size: 256 | lm loss: 2.192233E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.009 | TFLOPs: 31.32 | 7: iteration 92740/ 115203 | consumed samples: 23741440 | consumed tokens: 48622469120 | elapsed time per iteration (s): 0.42 | learning rate: 3.669E-05 | global batch size: 256 | lm loss: 2.216554E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.272 | TFLOPs: 31.97 | 7: iteration 92750/ 115203 | consumed samples: 23744000 | consumed tokens: 48627712000 | elapsed time per iteration (s): 0.44 | learning rate: 3.667E-05 | global batch size: 256 | lm loss: 2.249226E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.772 | TFLOPs: 30.79 | 7: iteration 92760/ 115203 | consumed samples: 23746560 | consumed tokens: 48632954880 | elapsed time per iteration (s): 0.43 | learning rate: 3.666E-05 | global batch size: 256 | lm loss: 2.230816E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.402 | TFLOPs: 31.55 | 7: iteration 92770/ 115203 | consumed samples: 23749120 | consumed tokens: 48638197760 | elapsed time per iteration (s): 0.43 | learning rate: 3.664E-05 | global batch size: 256 | lm loss: 2.221674E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.147 | TFLOPs: 31.38 | 7: iteration 92780/ 115203 | consumed samples: 23751680 | consumed tokens: 48643440640 | elapsed time per iteration (s): 0.43 | learning rate: 3.663E-05 | global batch size: 256 | lm loss: 2.232775E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.014 | TFLOPs: 31.38 | 7: iteration 92790/ 115203 | consumed samples: 23754240 | consumed tokens: 48648683520 | elapsed time per iteration (s): 0.43 | learning rate: 3.662E-05 | global batch size: 256 | lm loss: 2.229036E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.224 | TFLOPs: 31.44 | 7: iteration 92800/ 115203 | consumed samples: 23756800 | consumed tokens: 48653926400 | elapsed time per iteration (s): 0.44 | learning rate: 3.660E-05 | global batch size: 256 | lm loss: 2.227176E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.006 | TFLOPs: 30.85 | 7: iteration 92810/ 115203 | consumed samples: 23759360 | consumed tokens: 48659169280 | elapsed time per iteration (s): 0.44 | learning rate: 3.659E-05 | global batch size: 256 | lm loss: 2.241598E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.269 | TFLOPs: 30.45 | 7: iteration 92820/ 115203 | consumed samples: 23761920 | consumed tokens: 48664412160 | elapsed time per iteration (s): 0.44 | learning rate: 3.657E-05 | global batch size: 256 | lm loss: 2.236356E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.438 | TFLOPs: 30.40 | 7: iteration 92830/ 115203 | consumed samples: 23764480 | consumed tokens: 48669655040 | elapsed time per iteration (s): 0.43 | learning rate: 3.656E-05 | global batch size: 256 | lm loss: 2.217094E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.927 | TFLOPs: 31.27 | 7: iteration 92840/ 115203 | consumed samples: 23767040 | consumed tokens: 48674897920 | elapsed time per iteration (s): 0.43 | learning rate: 3.654E-05 | global batch size: 256 | lm loss: 2.183586E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.455 | TFLOPs: 31.19 | 7: iteration 92850/ 115203 | consumed samples: 23769600 | consumed tokens: 48680140800 | elapsed time per iteration (s): 0.43 | learning rate: 3.653E-05 | global batch size: 256 | lm loss: 2.254120E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.460 | TFLOPs: 30.98 | 7: iteration 92860/ 115203 | consumed samples: 23772160 | consumed tokens: 48685383680 | elapsed time per iteration (s): 0.44 | learning rate: 3.651E-05 | global batch size: 256 | lm loss: 2.245031E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.907 | TFLOPs: 30.37 | 7: iteration 92870/ 115203 | consumed samples: 23774720 | consumed tokens: 48690626560 | elapsed time per iteration (s): 0.45 | learning rate: 3.650E-05 | global batch size: 256 | lm loss: 2.215672E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.935 | TFLOPs: 29.96 | 7: iteration 92880/ 115203 | consumed samples: 23777280 | consumed tokens: 48695869440 | elapsed time per iteration (s): 0.44 | learning rate: 3.649E-05 | global batch size: 256 | lm loss: 2.232198E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.194 | TFLOPs: 30.81 | 7: iteration 92890/ 115203 | consumed samples: 23779840 | consumed tokens: 48701112320 | elapsed time per iteration (s): 0.43 | learning rate: 3.647E-05 | global batch size: 256 | lm loss: 2.204056E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.560 | TFLOPs: 31.09 | 7: iteration 92900/ 115203 | consumed samples: 23782400 | consumed tokens: 48706355200 | elapsed time per iteration (s): 0.44 | learning rate: 3.646E-05 | global batch size: 256 | lm loss: 2.224262E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.575 | TFLOPs: 30.41 | 7: iteration 92910/ 115203 | consumed samples: 23784960 | consumed tokens: 48711598080 | elapsed time per iteration (s): 0.43 | learning rate: 3.644E-05 | global batch size: 256 | lm loss: 2.225957E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.884 | TFLOPs: 30.90 | 7: iteration 92920/ 115203 | consumed samples: 23787520 | consumed tokens: 48716840960 | elapsed time per iteration (s): 0.43 | learning rate: 3.643E-05 | global batch size: 256 | lm loss: 2.227779E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.936 | TFLOPs: 31.32 | 7: iteration 92930/ 115203 | consumed samples: 23790080 | consumed tokens: 48722083840 | elapsed time per iteration (s): 0.44 | learning rate: 3.641E-05 | global batch size: 256 | lm loss: 2.245804E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.412 | TFLOPs: 30.66 | 7: iteration 92940/ 115203 | consumed samples: 23792640 | consumed tokens: 48727326720 | elapsed time per iteration (s): 0.44 | learning rate: 3.640E-05 | global batch size: 256 | lm loss: 2.247153E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.729 | TFLOPs: 30.47 | 7: iteration 92950/ 115203 | consumed samples: 23795200 | consumed tokens: 48732569600 | elapsed time per iteration (s): 0.43 | learning rate: 3.639E-05 | global batch size: 256 | lm loss: 2.236569E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.386 | TFLOPs: 31.03 | 7: iteration 92960/ 115203 | consumed samples: 23797760 | consumed tokens: 48737812480 | elapsed time per iteration (s): 0.42 | learning rate: 3.637E-05 | global batch size: 256 | lm loss: 2.229738E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.540 | TFLOPs: 31.67 | 7: iteration 92970/ 115203 | consumed samples: 23800320 | consumed tokens: 48743055360 | elapsed time per iteration (s): 0.43 | learning rate: 3.636E-05 | global batch size: 256 | lm loss: 2.210930E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.545 | TFLOPs: 31.46 | 7: iteration 92980/ 115203 | consumed samples: 23802880 | consumed tokens: 48748298240 | elapsed time per iteration (s): 0.43 | learning rate: 3.634E-05 | global batch size: 256 | lm loss: 2.258741E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.831 | TFLOPs: 31.21 | 7: iteration 92990/ 115203 | consumed samples: 23805440 | consumed tokens: 48753541120 | elapsed time per iteration (s): 0.44 | learning rate: 3.633E-05 | global batch size: 256 | lm loss: 2.252569E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.223 | TFLOPs: 30.71 | 7: iteration 93000/ 115203 | consumed samples: 23808000 | consumed tokens: 48758784000 | elapsed time per iteration (s): 0.43 | learning rate: 3.631E-05 | global batch size: 256 | lm loss: 2.224424E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.320 | TFLOPs: 31.39 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 93000 | lm loss value: 2.181699E+00 | lm loss PPL: 8.861351E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 93000 to checkpoints_221m 0: [2022-11-29 00:10:00,011] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step93000 is begin to save! 0: [2022-11-29 00:10:00,023] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:10:00,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:10:00,140] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:10:00,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:10:00,164] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:10:00,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:10:00,190] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:10:00,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:10:00,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:10:00,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:10:00,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:10:00,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:10:00,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:10:00,297] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:10:00,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:10:00,322] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:10:00,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:10:00,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:10:00,349] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:10:00,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:10:00,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:10:00,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:10:00,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:10:00,442] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:10:00,442] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:10:00,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:10:00,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:10:00,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:10:00,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:10:00,521] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:10:00,521] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:10:00,547] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:10:00,547] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:10:00,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:10:00,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:10:00,628] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:10:00,628] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:10:00,655] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:10:00,655] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:10:00,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:10:00,660] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step93000/mp_rank_00_model_states.pt 0: [2022-11-29 00:10:00,660] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:10:00,662] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,681] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:10:00,680] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:10:00,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,730] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,730] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2022-11-29 00:10:00,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,731] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,731] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,731] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,732] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,732] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,732] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,734] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,734] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,735] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,735] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,736] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,736] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,736] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,738] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 3: [2022-11-29 00:10:00,739] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,739] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,739] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,740] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,740] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,741] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 7: [2022-11-29 00:10:00,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:10:00,742] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:10:00,742] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,750] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,750] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,750] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,748] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,743] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,743] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,748] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:10:00,744] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,748] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,746] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2022-11-29 00:10:00,744] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,746] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 2: [2022-11-29 00:10:00,744] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,746] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,747] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,747] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,749] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 1: [2022-11-29 00:10:00,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,749] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 1: [2022-11-29 00:10:00,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:10:00,754] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,754] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,754] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,755] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:10:00,745] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:10:00,745] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,755] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,756] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,756] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 5: [2022-11-29 00:10:00,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:10:00,758] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:10:00,758] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,760] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,760] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,760] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,761] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,761] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,761] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:10:00,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 4: [2022-11-29 00:10:00,762] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:10:00,762] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 6: [2022-11-29 00:10:00,769] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:10:00,770] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:10:00,770] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:10:00,775] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,775] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: [2022-11-29 00:10:00,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step93000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:10:00,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step93000 is ready now! 0: successfully saved checkpoint at iteration 93000 to checkpoints_221m 7: time (ms) | save-checkpoint: 825.41 7: iteration 93010/ 115203 | consumed samples: 23810560 | consumed tokens: 48764026880 | elapsed time per iteration (s): 0.53 | learning rate: 3.630E-05 | global batch size: 256 | lm loss: 2.219674E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 485.401 | TFLOPs: 25.47 | 7: iteration 93020/ 115203 | consumed samples: 23813120 | consumed tokens: 48769269760 | elapsed time per iteration (s): 0.43 | learning rate: 3.629E-05 | global batch size: 256 | lm loss: 2.238853E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.458 | TFLOPs: 31.35 | 7: iteration 93030/ 115203 | consumed samples: 23815680 | consumed tokens: 48774512640 | elapsed time per iteration (s): 0.43 | learning rate: 3.627E-05 | global batch size: 256 | lm loss: 2.262839E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.273 | TFLOPs: 31.02 | 7: iteration 93040/ 115203 | consumed samples: 23818240 | consumed tokens: 48779755520 | elapsed time per iteration (s): 0.43 | learning rate: 3.626E-05 | global batch size: 256 | lm loss: 2.245839E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.113 | TFLOPs: 31.43 | 7: iteration 93050/ 115203 | consumed samples: 23820800 | consumed tokens: 48784998400 | elapsed time per iteration (s): 0.43 | learning rate: 3.624E-05 | global batch size: 256 | lm loss: 2.199911E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.350 | TFLOPs: 31.50 | 7: iteration 93060/ 115203 | consumed samples: 23823360 | consumed tokens: 48790241280 | elapsed time per iteration (s): 0.43 | learning rate: 3.623E-05 | global batch size: 256 | lm loss: 2.222726E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.715 | TFLOPs: 31.15 | 7: iteration 93070/ 115203 | consumed samples: 23825920 | consumed tokens: 48795484160 | elapsed time per iteration (s): 0.43 | learning rate: 3.622E-05 | global batch size: 256 | lm loss: 2.261036E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.712 | TFLOPs: 31.20 | 7: iteration 93080/ 115203 | consumed samples: 23828480 | consumed tokens: 48800727040 | elapsed time per iteration (s): 0.43 | learning rate: 3.620E-05 | global batch size: 256 | lm loss: 2.274438E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.831 | TFLOPs: 31.00 | 7: iteration 93090/ 115203 | consumed samples: 23831040 | consumed tokens: 48805969920 | elapsed time per iteration (s): 0.44 | learning rate: 3.619E-05 | global batch size: 256 | lm loss: 2.255675E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.361 | TFLOPs: 30.45 | 7: iteration 93100/ 115203 | consumed samples: 23833600 | consumed tokens: 48811212800 | elapsed time per iteration (s): 0.44 | learning rate: 3.617E-05 | global batch size: 256 | lm loss: 2.205793E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.625 | TFLOPs: 30.41 | 7: iteration 93110/ 115203 | consumed samples: 23836160 | consumed tokens: 48816455680 | elapsed time per iteration (s): 0.43 | learning rate: 3.616E-05 | global batch size: 256 | lm loss: 2.239149E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.632 | TFLOPs: 31.20 | 7: iteration 93120/ 115203 | consumed samples: 23838720 | consumed tokens: 48821698560 | elapsed time per iteration (s): 0.44 | learning rate: 3.614E-05 | global batch size: 256 | lm loss: 2.218367E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.255 | TFLOPs: 30.39 | 7: iteration 93130/ 115203 | consumed samples: 23841280 | consumed tokens: 48826941440 | elapsed time per iteration (s): 0.43 | learning rate: 3.613E-05 | global batch size: 256 | lm loss: 2.205975E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.995 | TFLOPs: 31.27 | 7: iteration 93140/ 115203 | consumed samples: 23843840 | consumed tokens: 48832184320 | elapsed time per iteration (s): 0.43 | learning rate: 3.612E-05 | global batch size: 256 | lm loss: 2.216806E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.121 | TFLOPs: 31.02 | 7: iteration 93150/ 115203 | consumed samples: 23846400 | consumed tokens: 48837427200 | elapsed time per iteration (s): 0.43 | learning rate: 3.610E-05 | global batch size: 256 | lm loss: 2.232977E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.265 | TFLOPs: 31.44 | 7: iteration 93160/ 115203 | consumed samples: 23848960 | consumed tokens: 48842670080 | elapsed time per iteration (s): 0.44 | learning rate: 3.609E-05 | global batch size: 256 | lm loss: 2.191773E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.126 | TFLOPs: 30.60 | 7: iteration 93170/ 115203 | consumed samples: 23851520 | consumed tokens: 48847912960 | elapsed time per iteration (s): 0.45 | learning rate: 3.607E-05 | global batch size: 256 | lm loss: 2.200014E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.579 | TFLOPs: 29.68 | 7: iteration 93180/ 115203 | consumed samples: 23854080 | consumed tokens: 48853155840 | elapsed time per iteration (s): 0.44 | learning rate: 3.606E-05 | global batch size: 256 | lm loss: 2.239227E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.986 | TFLOPs: 30.59 | 7: iteration 93190/ 115203 | consumed samples: 23856640 | consumed tokens: 48858398720 | elapsed time per iteration (s): 0.42 | learning rate: 3.605E-05 | global batch size: 256 | lm loss: 2.236597E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.820 | TFLOPs: 31.68 | 7: iteration 93200/ 115203 | consumed samples: 23859200 | consumed tokens: 48863641600 | elapsed time per iteration (s): 0.43 | learning rate: 3.603E-05 | global batch size: 256 | lm loss: 2.230443E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.568 | TFLOPs: 31.20 | 7: iteration 93210/ 115203 | consumed samples: 23861760 | consumed tokens: 48868884480 | elapsed time per iteration (s): 0.45 | learning rate: 3.602E-05 | global batch size: 256 | lm loss: 2.252172E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.065 | TFLOPs: 29.70 | 7: iteration 93220/ 115203 | consumed samples: 23864320 | consumed tokens: 48874127360 | elapsed time per iteration (s): 0.44 | learning rate: 3.600E-05 | global batch size: 256 | lm loss: 2.245459E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.806 | TFLOPs: 30.47 | 7: iteration 93230/ 115203 | consumed samples: 23866880 | consumed tokens: 48879370240 | elapsed time per iteration (s): 0.45 | learning rate: 3.599E-05 | global batch size: 256 | lm loss: 2.240037E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.258 | TFLOPs: 29.92 | 7: iteration 93240/ 115203 | consumed samples: 23869440 | consumed tokens: 48884613120 | elapsed time per iteration (s): 0.44 | learning rate: 3.597E-05 | global batch size: 256 | lm loss: 2.230902E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.991 | TFLOPs: 30.59 | 7: iteration 93250/ 115203 | consumed samples: 23872000 | consumed tokens: 48889856000 | elapsed time per iteration (s): 0.44 | learning rate: 3.596E-05 | global batch size: 256 | lm loss: 2.218831E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.888 | TFLOPs: 30.64 | 7: iteration 93260/ 115203 | consumed samples: 23874560 | consumed tokens: 48895098880 | elapsed time per iteration (s): 0.44 | learning rate: 3.595E-05 | global batch size: 256 | lm loss: 2.232391E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.331 | TFLOPs: 30.87 | 7: iteration 93270/ 115203 | consumed samples: 23877120 | consumed tokens: 48900341760 | elapsed time per iteration (s): 0.43 | learning rate: 3.593E-05 | global batch size: 256 | lm loss: 2.255452E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.857 | TFLOPs: 31.42 | 7: iteration 93280/ 115203 | consumed samples: 23879680 | consumed tokens: 48905584640 | elapsed time per iteration (s): 0.44 | learning rate: 3.592E-05 | global batch size: 256 | lm loss: 2.210181E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.998 | TFLOPs: 30.59 | 7: iteration 93290/ 115203 | consumed samples: 23882240 | consumed tokens: 48910827520 | elapsed time per iteration (s): 0.44 | learning rate: 3.590E-05 | global batch size: 256 | lm loss: 2.231753E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.877 | TFLOPs: 30.32 | 7: iteration 93300/ 115203 | consumed samples: 23884800 | consumed tokens: 48916070400 | elapsed time per iteration (s): 0.45 | learning rate: 3.589E-05 | global batch size: 256 | lm loss: 2.205306E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.460 | TFLOPs: 29.93 | 7: iteration 93310/ 115203 | consumed samples: 23887360 | consumed tokens: 48921313280 | elapsed time per iteration (s): 0.45 | learning rate: 3.588E-05 | global batch size: 256 | lm loss: 2.252591E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.961 | TFLOPs: 29.80 | 7: iteration 93320/ 115203 | consumed samples: 23889920 | consumed tokens: 48926556160 | elapsed time per iteration (s): 0.43 | learning rate: 3.586E-05 | global batch size: 256 | lm loss: 2.213116E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.418 | TFLOPs: 31.29 | 7: iteration 93330/ 115203 | consumed samples: 23892480 | consumed tokens: 48931799040 | elapsed time per iteration (s): 0.44 | learning rate: 3.585E-05 | global batch size: 256 | lm loss: 2.233891E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.180 | TFLOPs: 30.39 | 7: iteration 93340/ 115203 | consumed samples: 23895040 | consumed tokens: 48937041920 | elapsed time per iteration (s): 0.45 | learning rate: 3.583E-05 | global batch size: 256 | lm loss: 2.209003E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.360 | TFLOPs: 29.66 | 7: iteration 93350/ 115203 | consumed samples: 23897600 | consumed tokens: 48942284800 | elapsed time per iteration (s): 0.43 | learning rate: 3.582E-05 | global batch size: 256 | lm loss: 2.231971E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.988 | TFLOPs: 31.11 | 7: iteration 93360/ 115203 | consumed samples: 23900160 | consumed tokens: 48947527680 | elapsed time per iteration (s): 0.43 | learning rate: 3.581E-05 | global batch size: 256 | lm loss: 2.240773E+00 | grad norm: 0.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.534 | TFLOPs: 31.56 | 7: iteration 93370/ 115203 | consumed samples: 23902720 | consumed tokens: 48952770560 | elapsed time per iteration (s): 0.44 | learning rate: 3.579E-05 | global batch size: 256 | lm loss: 2.236846E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.876 | TFLOPs: 30.79 | 7: iteration 93380/ 115203 | consumed samples: 23905280 | consumed tokens: 48958013440 | elapsed time per iteration (s): 0.46 | learning rate: 3.578E-05 | global batch size: 256 | lm loss: 2.240875E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.996 | TFLOPs: 29.07 | 7: iteration 93390/ 115203 | consumed samples: 23907840 | consumed tokens: 48963256320 | elapsed time per iteration (s): 0.45 | learning rate: 3.576E-05 | global batch size: 256 | lm loss: 2.220412E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.102 | TFLOPs: 29.70 | 7: iteration 93400/ 115203 | consumed samples: 23910400 | consumed tokens: 48968499200 | elapsed time per iteration (s): 0.43 | learning rate: 3.575E-05 | global batch size: 256 | lm loss: 2.218655E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.639 | TFLOPs: 31.36 | 7: iteration 93410/ 115203 | consumed samples: 23912960 | consumed tokens: 48973742080 | elapsed time per iteration (s): 0.43 | learning rate: 3.574E-05 | global batch size: 256 | lm loss: 2.242534E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.023 | TFLOPs: 31.27 | 7: iteration 93420/ 115203 | consumed samples: 23915520 | consumed tokens: 48978984960 | elapsed time per iteration (s): 0.43 | learning rate: 3.572E-05 | global batch size: 256 | lm loss: 2.220856E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.802 | TFLOPs: 30.95 | 7: iteration 93430/ 115203 | consumed samples: 23918080 | consumed tokens: 48984227840 | elapsed time per iteration (s): 0.44 | learning rate: 3.571E-05 | global batch size: 256 | lm loss: 2.232516E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.532 | TFLOPs: 30.62 | 7: iteration 93440/ 115203 | consumed samples: 23920640 | consumed tokens: 48989470720 | elapsed time per iteration (s): 0.43 | learning rate: 3.569E-05 | global batch size: 256 | lm loss: 2.237249E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.822 | TFLOPs: 30.95 | 7: iteration 93450/ 115203 | consumed samples: 23923200 | consumed tokens: 48994713600 | elapsed time per iteration (s): 0.44 | learning rate: 3.568E-05 | global batch size: 256 | lm loss: 2.216320E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.629 | TFLOPs: 30.67 | 7: iteration 93460/ 115203 | consumed samples: 23925760 | consumed tokens: 48999956480 | elapsed time per iteration (s): 0.43 | learning rate: 3.567E-05 | global batch size: 256 | lm loss: 2.226314E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.840 | TFLOPs: 31.11 | 7: iteration 93470/ 115203 | consumed samples: 23928320 | consumed tokens: 49005199360 | elapsed time per iteration (s): 0.45 | learning rate: 3.565E-05 | global batch size: 256 | lm loss: 2.200200E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.830 | TFLOPs: 30.06 | 7: iteration 93480/ 115203 | consumed samples: 23930880 | consumed tokens: 49010442240 | elapsed time per iteration (s): 0.43 | learning rate: 3.564E-05 | global batch size: 256 | lm loss: 2.241631E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.712 | TFLOPs: 31.05 | 7: iteration 93490/ 115203 | consumed samples: 23933440 | consumed tokens: 49015685120 | elapsed time per iteration (s): 0.43 | learning rate: 3.562E-05 | global batch size: 256 | lm loss: 2.238789E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.625 | TFLOPs: 31.51 | 7: iteration 93500/ 115203 | consumed samples: 23936000 | consumed tokens: 49020928000 | elapsed time per iteration (s): 0.44 | learning rate: 3.561E-05 | global batch size: 256 | lm loss: 2.258956E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.289 | TFLOPs: 30.45 | 7: iteration 93510/ 115203 | consumed samples: 23938560 | consumed tokens: 49026170880 | elapsed time per iteration (s): 0.45 | learning rate: 3.560E-05 | global batch size: 256 | lm loss: 2.199183E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.480 | TFLOPs: 29.88 | 7: iteration 93520/ 115203 | consumed samples: 23941120 | consumed tokens: 49031413760 | elapsed time per iteration (s): 0.44 | learning rate: 3.558E-05 | global batch size: 256 | lm loss: 2.212133E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.940 | TFLOPs: 30.85 | 7: iteration 93530/ 115203 | consumed samples: 23943680 | consumed tokens: 49036656640 | elapsed time per iteration (s): 0.43 | learning rate: 3.557E-05 | global batch size: 256 | lm loss: 2.231845E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.870 | TFLOPs: 31.11 | 7: iteration 93540/ 115203 | consumed samples: 23946240 | consumed tokens: 49041899520 | elapsed time per iteration (s): 0.43 | learning rate: 3.555E-05 | global batch size: 256 | lm loss: 2.225673E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.233 | TFLOPs: 31.55 | 7: iteration 93550/ 115203 | consumed samples: 23948800 | consumed tokens: 49047142400 | elapsed time per iteration (s): 0.45 | learning rate: 3.554E-05 | global batch size: 256 | lm loss: 2.262972E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.453 | TFLOPs: 30.14 | 7: iteration 93560/ 115203 | consumed samples: 23951360 | consumed tokens: 49052385280 | elapsed time per iteration (s): 0.45 | learning rate: 3.553E-05 | global batch size: 256 | lm loss: 2.275190E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.241 | TFLOPs: 29.60 | 7: iteration 93570/ 115203 | consumed samples: 23953920 | consumed tokens: 49057628160 | elapsed time per iteration (s): 0.44 | learning rate: 3.551E-05 | global batch size: 256 | lm loss: 2.231960E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.473 | TFLOPs: 30.25 | 7: iteration 93580/ 115203 | consumed samples: 23956480 | consumed tokens: 49062871040 | elapsed time per iteration (s): 0.44 | learning rate: 3.550E-05 | global batch size: 256 | lm loss: 2.243039E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.260 | TFLOPs: 30.39 | 7: iteration 93590/ 115203 | consumed samples: 23959040 | consumed tokens: 49068113920 | elapsed time per iteration (s): 0.43 | learning rate: 3.548E-05 | global batch size: 256 | lm loss: 2.267964E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.062 | TFLOPs: 31.27 | 7: iteration 93600/ 115203 | consumed samples: 23961600 | consumed tokens: 49073356800 | elapsed time per iteration (s): 0.44 | learning rate: 3.547E-05 | global batch size: 256 | lm loss: 2.232089E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.332 | TFLOPs: 30.87 | 7: iteration 93610/ 115203 | consumed samples: 23964160 | consumed tokens: 49078599680 | elapsed time per iteration (s): 0.44 | learning rate: 3.546E-05 | global batch size: 256 | lm loss: 2.203061E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.974 | TFLOPs: 30.48 | 7: iteration 93620/ 115203 | consumed samples: 23966720 | consumed tokens: 49083842560 | elapsed time per iteration (s): 0.44 | learning rate: 3.544E-05 | global batch size: 256 | lm loss: 2.207288E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.698 | TFLOPs: 30.78 | 7: iteration 93630/ 115203 | consumed samples: 23969280 | consumed tokens: 49089085440 | elapsed time per iteration (s): 0.44 | learning rate: 3.543E-05 | global batch size: 256 | lm loss: 2.218715E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.150 | TFLOPs: 30.65 | 7: iteration 93640/ 115203 | consumed samples: 23971840 | consumed tokens: 49094328320 | elapsed time per iteration (s): 0.43 | learning rate: 3.542E-05 | global batch size: 256 | lm loss: 2.213729E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.556 | TFLOPs: 30.93 | 7: iteration 93650/ 115203 | consumed samples: 23974400 | consumed tokens: 49099571200 | elapsed time per iteration (s): 0.43 | learning rate: 3.540E-05 | global batch size: 256 | lm loss: 2.219363E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.629 | TFLOPs: 31.09 | 7: iteration 93660/ 115203 | consumed samples: 23976960 | consumed tokens: 49104814080 | elapsed time per iteration (s): 0.43 | learning rate: 3.539E-05 | global batch size: 256 | lm loss: 2.201155E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.032 | TFLOPs: 31.27 | 7: iteration 93670/ 115203 | consumed samples: 23979520 | consumed tokens: 49110056960 | elapsed time per iteration (s): 0.44 | learning rate: 3.537E-05 | global batch size: 256 | lm loss: 2.195731E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.029 | TFLOPs: 30.70 | 7: iteration 93680/ 115203 | consumed samples: 23982080 | consumed tokens: 49115299840 | elapsed time per iteration (s): 0.44 | learning rate: 3.536E-05 | global batch size: 256 | lm loss: 2.231053E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.418 | TFLOPs: 30.30 | 7: iteration 93690/ 115203 | consumed samples: 23984640 | consumed tokens: 49120542720 | elapsed time per iteration (s): 0.44 | learning rate: 3.535E-05 | global batch size: 256 | lm loss: 2.228018E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.263 | TFLOPs: 30.76 | 7: iteration 93700/ 115203 | consumed samples: 23987200 | consumed tokens: 49125785600 | elapsed time per iteration (s): 0.44 | learning rate: 3.533E-05 | global batch size: 256 | lm loss: 2.224306E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.564 | TFLOPs: 30.62 | 7: iteration 93710/ 115203 | consumed samples: 23989760 | consumed tokens: 49131028480 | elapsed time per iteration (s): 0.45 | learning rate: 3.532E-05 | global batch size: 256 | lm loss: 2.238458E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.397 | TFLOPs: 29.88 | 7: iteration 93720/ 115203 | consumed samples: 23992320 | consumed tokens: 49136271360 | elapsed time per iteration (s): 0.44 | learning rate: 3.530E-05 | global batch size: 256 | lm loss: 2.241799E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.128 | TFLOPs: 30.70 | 7: iteration 93730/ 115203 | consumed samples: 23994880 | consumed tokens: 49141514240 | elapsed time per iteration (s): 0.47 | learning rate: 3.529E-05 | global batch size: 256 | lm loss: 2.232005E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.731 | TFLOPs: 28.63 | 7: iteration 93740/ 115203 | consumed samples: 23997440 | consumed tokens: 49146757120 | elapsed time per iteration (s): 0.46 | learning rate: 3.528E-05 | global batch size: 256 | lm loss: 2.236793E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.356 | TFLOPs: 28.98 | 7: iteration 93750/ 115203 | consumed samples: 24000000 | consumed tokens: 49152000000 | elapsed time per iteration (s): 0.43 | learning rate: 3.526E-05 | global batch size: 256 | lm loss: 2.221581E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.022 | TFLOPs: 31.27 | 7: iteration 93760/ 115203 | consumed samples: 24002560 | consumed tokens: 49157242880 | elapsed time per iteration (s): 0.44 | learning rate: 3.525E-05 | global batch size: 256 | lm loss: 2.216871E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.982 | TFLOPs: 30.54 | 7: iteration 93770/ 115203 | consumed samples: 24005120 | consumed tokens: 49162485760 | elapsed time per iteration (s): 0.43 | learning rate: 3.524E-05 | global batch size: 256 | lm loss: 2.192371E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.940 | TFLOPs: 31.32 | 7: iteration 93780/ 115203 | consumed samples: 24007680 | consumed tokens: 49167728640 | elapsed time per iteration (s): 0.43 | learning rate: 3.522E-05 | global batch size: 256 | lm loss: 2.247248E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.393 | TFLOPs: 31.03 | 7: iteration 93790/ 115203 | consumed samples: 24010240 | consumed tokens: 49172971520 | elapsed time per iteration (s): 0.44 | learning rate: 3.521E-05 | global batch size: 256 | lm loss: 2.235183E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.525 | TFLOPs: 30.62 | 7: iteration 93800/ 115203 | consumed samples: 24012800 | consumed tokens: 49178214400 | elapsed time per iteration (s): 0.43 | learning rate: 3.519E-05 | global batch size: 256 | lm loss: 2.214826E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.501 | TFLOPs: 31.14 | 7: iteration 93810/ 115203 | consumed samples: 24015360 | consumed tokens: 49183457280 | elapsed time per iteration (s): 0.43 | learning rate: 3.518E-05 | global batch size: 256 | lm loss: 2.228979E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.970 | TFLOPs: 30.95 | 7: iteration 93820/ 115203 | consumed samples: 24017920 | consumed tokens: 49188700160 | elapsed time per iteration (s): 0.43 | learning rate: 3.517E-05 | global batch size: 256 | lm loss: 2.242247E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.886 | TFLOPs: 31.42 | 7: iteration 93830/ 115203 | consumed samples: 24020480 | consumed tokens: 49193943040 | elapsed time per iteration (s): 0.43 | learning rate: 3.515E-05 | global batch size: 256 | lm loss: 2.266705E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.169 | TFLOPs: 31.44 | 7: iteration 93840/ 115203 | consumed samples: 24023040 | consumed tokens: 49199185920 | elapsed time per iteration (s): 0.43 | learning rate: 3.514E-05 | global batch size: 256 | lm loss: 2.210312E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.614 | TFLOPs: 31.04 | 7: iteration 93850/ 115203 | consumed samples: 24025600 | consumed tokens: 49204428800 | elapsed time per iteration (s): 0.44 | learning rate: 3.513E-05 | global batch size: 256 | lm loss: 2.215727E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.626 | TFLOPs: 30.73 | 7: iteration 93860/ 115203 | consumed samples: 24028160 | consumed tokens: 49209671680 | elapsed time per iteration (s): 0.43 | learning rate: 3.511E-05 | global batch size: 256 | lm loss: 2.237545E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.467 | TFLOPs: 31.19 | 7: iteration 93870/ 115203 | consumed samples: 24030720 | consumed tokens: 49214914560 | elapsed time per iteration (s): 0.43 | learning rate: 3.510E-05 | global batch size: 256 | lm loss: 2.218295E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.247 | TFLOPs: 30.97 | 7: iteration 93880/ 115203 | consumed samples: 24033280 | consumed tokens: 49220157440 | elapsed time per iteration (s): 0.43 | learning rate: 3.508E-05 | global batch size: 256 | lm loss: 2.240431E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.473 | TFLOPs: 31.19 | 7: iteration 93890/ 115203 | consumed samples: 24035840 | consumed tokens: 49225400320 | elapsed time per iteration (s): 0.43 | learning rate: 3.507E-05 | global batch size: 256 | lm loss: 2.204094E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.589 | TFLOPs: 31.04 | 7: iteration 93900/ 115203 | consumed samples: 24038400 | consumed tokens: 49230643200 | elapsed time per iteration (s): 0.44 | learning rate: 3.506E-05 | global batch size: 256 | lm loss: 2.220794E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.969 | TFLOPs: 30.59 | 7: iteration 93910/ 115203 | consumed samples: 24040960 | consumed tokens: 49235886080 | elapsed time per iteration (s): 0.44 | learning rate: 3.504E-05 | global batch size: 256 | lm loss: 2.227225E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.248 | TFLOPs: 30.39 | 7: iteration 93920/ 115203 | consumed samples: 24043520 | consumed tokens: 49241128960 | elapsed time per iteration (s): 0.45 | learning rate: 3.503E-05 | global batch size: 256 | lm loss: 2.230932E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.992 | TFLOPs: 30.12 | 7: iteration 93930/ 115203 | consumed samples: 24046080 | consumed tokens: 49246371840 | elapsed time per iteration (s): 0.43 | learning rate: 3.502E-05 | global batch size: 256 | lm loss: 2.224831E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.143 | TFLOPs: 31.12 | 7: iteration 93940/ 115203 | consumed samples: 24048640 | consumed tokens: 49251614720 | elapsed time per iteration (s): 0.45 | learning rate: 3.500E-05 | global batch size: 256 | lm loss: 2.224686E+00 | grad norm: 0.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.856 | TFLOPs: 30.16 | 7: iteration 93950/ 115203 | consumed samples: 24051200 | consumed tokens: 49256857600 | elapsed time per iteration (s): 0.43 | learning rate: 3.499E-05 | global batch size: 256 | lm loss: 2.198692E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.579 | TFLOPs: 31.35 | 7: iteration 93960/ 115203 | consumed samples: 24053760 | consumed tokens: 49262100480 | elapsed time per iteration (s): 0.42 | learning rate: 3.497E-05 | global batch size: 256 | lm loss: 2.214328E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.126 | TFLOPs: 31.91 | 7: iteration 93970/ 115203 | consumed samples: 24056320 | consumed tokens: 49267343360 | elapsed time per iteration (s): 0.43 | learning rate: 3.496E-05 | global batch size: 256 | lm loss: 2.253754E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.909 | TFLOPs: 31.11 | 7: iteration 93980/ 115203 | consumed samples: 24058880 | consumed tokens: 49272586240 | elapsed time per iteration (s): 0.44 | learning rate: 3.495E-05 | global batch size: 256 | lm loss: 2.225294E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.860 | TFLOPs: 30.37 | 7: iteration 93990/ 115203 | consumed samples: 24061440 | consumed tokens: 49277829120 | elapsed time per iteration (s): 0.43 | learning rate: 3.493E-05 | global batch size: 256 | lm loss: 2.226518E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.828 | TFLOPs: 31.42 | 0: [2022-11-29 00:17:17,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=0, lr=[3.4919569923835e-05, 3.4919569923835e-05, 3.4919569923835e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 94000/ 115203 | consumed samples: 24064000 | consumed tokens: 49283072000 | elapsed time per iteration (s): 0.42 | learning rate: 3.492E-05 | global batch size: 256 | lm loss: 2.234475E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.278 | TFLOPs: 31.81 | 0: steps: 94000 loss: 2.2458 iter time (s): 0.434 samples/sec: 589.871 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 94000 | lm loss value: 2.181554E+00 | lm loss PPL: 8.860061E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 94000 to checkpoints_221m 0: [2022-11-29 00:17:17,891] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step94000 is begin to save! 0: [2022-11-29 00:17:17,894] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:17:18,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:17:18,006] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:17:18,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:17:18,038] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:17:18,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:17:18,070] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:17:18,101] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:17:18,101] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:17:18,133] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:17:18,134] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:17:18,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:17:18,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:17:18,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:17:18,201] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:17:18,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:17:18,225] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:17:18,250] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:17:18,250] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:17:18,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:17:18,276] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:17:18,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:17:18,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:17:18,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:17:18,332] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:17:18,356] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:17:18,357] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:17:18,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:17:18,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:17:18,413] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:17:18,413] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:17:18,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:17:18,438] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:17:18,461] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:17:18,461] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:17:18,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:17:18,485] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:17:18,509] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:17:18,509] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:17:18,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:17:18,514] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step94000/mp_rank_00_model_states.pt 0: [2022-11-29 00:17:18,514] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:17:18,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,534] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:17:18,535] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:17:18,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,583] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,583] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,588] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,588] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,589] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,589] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,590] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,594] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,594] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,596] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,596] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 1: [2022-11-29 00:17:18,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:17:18,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:17:18,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,601] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 6: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:17:18,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 2: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,602] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,602] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,603] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,603] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,603] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 4: [2022-11-29 00:17:18,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:17:18,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:17:18,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,598] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,598] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,598] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,599] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,599] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,599] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,604] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,604] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 5: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,605] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,605] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:17:18,606] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:17:18,606] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 7: [2022-11-29 00:17:18,607] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:17:18,607] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:17:18,607] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,611] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:17:18,612] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,612] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: [2022-11-29 00:17:18,650] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step94000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:17:18,650] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step94000 is ready now! 0: successfully saved checkpoint at iteration 94000 to checkpoints_221m 7: time (ms) | save-checkpoint: 783.79 7: iteration 94010/ 115203 | consumed samples: 24066560 | consumed tokens: 49288314880 | elapsed time per iteration (s): 0.52 | learning rate: 3.491E-05 | global batch size: 256 | lm loss: 2.234749E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.259 | TFLOPs: 25.67 | 7: iteration 94020/ 115203 | consumed samples: 24069120 | consumed tokens: 49293557760 | elapsed time per iteration (s): 0.51 | learning rate: 3.489E-05 | global batch size: 256 | lm loss: 2.217634E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 501.313 | TFLOPs: 26.30 | 7: iteration 94030/ 115203 | consumed samples: 24071680 | consumed tokens: 49298800640 | elapsed time per iteration (s): 0.44 | learning rate: 3.488E-05 | global batch size: 256 | lm loss: 2.256668E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.323 | TFLOPs: 30.87 | 7: iteration 94040/ 115203 | consumed samples: 24074240 | consumed tokens: 49304043520 | elapsed time per iteration (s): 0.44 | learning rate: 3.486E-05 | global batch size: 256 | lm loss: 2.247461E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.504 | TFLOPs: 30.88 | 7: iteration 94050/ 115203 | consumed samples: 24076800 | consumed tokens: 49309286400 | elapsed time per iteration (s): 0.43 | learning rate: 3.485E-05 | global batch size: 256 | lm loss: 2.203985E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.832 | TFLOPs: 31.26 | 7: iteration 94060/ 115203 | consumed samples: 24079360 | consumed tokens: 49314529280 | elapsed time per iteration (s): 0.43 | learning rate: 3.484E-05 | global batch size: 256 | lm loss: 2.198592E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.219 | TFLOPs: 30.97 | 7: iteration 94070/ 115203 | consumed samples: 24081920 | consumed tokens: 49319772160 | elapsed time per iteration (s): 0.43 | learning rate: 3.482E-05 | global batch size: 256 | lm loss: 2.241710E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.885 | TFLOPs: 30.95 | 7: iteration 94080/ 115203 | consumed samples: 24084480 | consumed tokens: 49325015040 | elapsed time per iteration (s): 0.44 | learning rate: 3.481E-05 | global batch size: 256 | lm loss: 2.210588E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.459 | TFLOPs: 30.67 | 7: iteration 94090/ 115203 | consumed samples: 24087040 | consumed tokens: 49330257920 | elapsed time per iteration (s): 0.46 | learning rate: 3.480E-05 | global batch size: 256 | lm loss: 2.241322E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.450 | TFLOPs: 29.30 | 7: iteration 94100/ 115203 | consumed samples: 24089600 | consumed tokens: 49335500800 | elapsed time per iteration (s): 0.43 | learning rate: 3.478E-05 | global batch size: 256 | lm loss: 2.239812E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.402 | TFLOPs: 31.40 | 7: iteration 94110/ 115203 | consumed samples: 24092160 | consumed tokens: 49340743680 | elapsed time per iteration (s): 0.43 | learning rate: 3.477E-05 | global batch size: 256 | lm loss: 2.209975E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.912 | TFLOPs: 31.00 | 7: iteration 94120/ 115203 | consumed samples: 24094720 | consumed tokens: 49345986560 | elapsed time per iteration (s): 0.44 | learning rate: 3.476E-05 | global batch size: 256 | lm loss: 2.209868E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.857 | TFLOPs: 30.63 | 7: iteration 94130/ 115203 | consumed samples: 24097280 | consumed tokens: 49351229440 | elapsed time per iteration (s): 0.43 | learning rate: 3.474E-05 | global batch size: 256 | lm loss: 2.227860E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.065 | TFLOPs: 31.06 | 7: iteration 94140/ 115203 | consumed samples: 24099840 | consumed tokens: 49356472320 | elapsed time per iteration (s): 0.43 | learning rate: 3.473E-05 | global batch size: 256 | lm loss: 2.224971E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.568 | TFLOPs: 31.09 | 7: iteration 94150/ 115203 | consumed samples: 24102400 | consumed tokens: 49361715200 | elapsed time per iteration (s): 0.43 | learning rate: 3.472E-05 | global batch size: 256 | lm loss: 2.221817E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.834 | TFLOPs: 31.52 | 7: iteration 94160/ 115203 | consumed samples: 24104960 | consumed tokens: 49366958080 | elapsed time per iteration (s): 0.44 | learning rate: 3.470E-05 | global batch size: 256 | lm loss: 2.214154E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.232 | TFLOPs: 30.39 | 7: iteration 94170/ 115203 | consumed samples: 24107520 | consumed tokens: 49372200960 | elapsed time per iteration (s): 0.44 | learning rate: 3.469E-05 | global batch size: 256 | lm loss: 2.239966E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.698 | TFLOPs: 30.42 | 7: iteration 94180/ 115203 | consumed samples: 24110080 | consumed tokens: 49377443840 | elapsed time per iteration (s): 0.44 | learning rate: 3.467E-05 | global batch size: 256 | lm loss: 2.227329E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.302 | TFLOPs: 30.66 | 7: iteration 94190/ 115203 | consumed samples: 24112640 | consumed tokens: 49382686720 | elapsed time per iteration (s): 0.43 | learning rate: 3.466E-05 | global batch size: 256 | lm loss: 2.229725E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.277 | TFLOPs: 31.08 | 7: iteration 94200/ 115203 | consumed samples: 24115200 | consumed tokens: 49387929600 | elapsed time per iteration (s): 0.45 | learning rate: 3.465E-05 | global batch size: 256 | lm loss: 2.218959E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.122 | TFLOPs: 30.07 | 7: iteration 94210/ 115203 | consumed samples: 24117760 | consumed tokens: 49393172480 | elapsed time per iteration (s): 0.43 | learning rate: 3.463E-05 | global batch size: 256 | lm loss: 2.231446E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.505 | TFLOPs: 31.51 | 7: iteration 94220/ 115203 | consumed samples: 24120320 | consumed tokens: 49398415360 | elapsed time per iteration (s): 0.42 | learning rate: 3.462E-05 | global batch size: 256 | lm loss: 2.223028E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.650 | TFLOPs: 31.67 | 7: iteration 94230/ 115203 | consumed samples: 24122880 | consumed tokens: 49403658240 | elapsed time per iteration (s): 0.43 | learning rate: 3.461E-05 | global batch size: 256 | lm loss: 2.208861E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.607 | TFLOPs: 31.04 | 7: iteration 94240/ 115203 | consumed samples: 24125440 | consumed tokens: 49408901120 | elapsed time per iteration (s): 0.43 | learning rate: 3.459E-05 | global batch size: 256 | lm loss: 2.238694E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.299 | TFLOPs: 31.44 | 7: iteration 94250/ 115203 | consumed samples: 24128000 | consumed tokens: 49414144000 | elapsed time per iteration (s): 0.43 | learning rate: 3.458E-05 | global batch size: 256 | lm loss: 2.212897E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.482 | TFLOPs: 31.19 | 7: iteration 94260/ 115203 | consumed samples: 24130560 | consumed tokens: 49419386880 | elapsed time per iteration (s): 0.44 | learning rate: 3.457E-05 | global batch size: 256 | lm loss: 2.255222E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.923 | TFLOPs: 30.32 | 7: iteration 94270/ 115203 | consumed samples: 24133120 | consumed tokens: 49424629760 | elapsed time per iteration (s): 0.43 | learning rate: 3.455E-05 | global batch size: 256 | lm loss: 2.213231E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.557 | TFLOPs: 30.93 | 7: iteration 94280/ 115203 | consumed samples: 24135680 | consumed tokens: 49429872640 | elapsed time per iteration (s): 0.44 | learning rate: 3.454E-05 | global batch size: 256 | lm loss: 2.208617E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.390 | TFLOPs: 30.45 | 7: iteration 94290/ 115203 | consumed samples: 24138240 | consumed tokens: 49435115520 | elapsed time per iteration (s): 0.43 | learning rate: 3.453E-05 | global batch size: 256 | lm loss: 2.215023E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.882 | TFLOPs: 31.16 | 7: iteration 94300/ 115203 | consumed samples: 24140800 | consumed tokens: 49440358400 | elapsed time per iteration (s): 0.43 | learning rate: 3.451E-05 | global batch size: 256 | lm loss: 2.222941E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.314 | TFLOPs: 31.08 | 7: iteration 94310/ 115203 | consumed samples: 24143360 | consumed tokens: 49445601280 | elapsed time per iteration (s): 0.43 | learning rate: 3.450E-05 | global batch size: 256 | lm loss: 2.231450E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.027 | TFLOPs: 31.27 | 7: iteration 94320/ 115203 | consumed samples: 24145920 | consumed tokens: 49450844160 | elapsed time per iteration (s): 0.42 | learning rate: 3.449E-05 | global batch size: 256 | lm loss: 2.217981E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.156 | TFLOPs: 31.70 | 7: iteration 94330/ 115203 | consumed samples: 24148480 | consumed tokens: 49456087040 | elapsed time per iteration (s): 0.43 | learning rate: 3.447E-05 | global batch size: 256 | lm loss: 2.237057E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.793 | TFLOPs: 31.05 | 7: iteration 94340/ 115203 | consumed samples: 24151040 | consumed tokens: 49461329920 | elapsed time per iteration (s): 0.43 | learning rate: 3.446E-05 | global batch size: 256 | lm loss: 2.227225E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.780 | TFLOPs: 31.00 | 7: iteration 94350/ 115203 | consumed samples: 24153600 | consumed tokens: 49466572800 | elapsed time per iteration (s): 0.43 | learning rate: 3.444E-05 | global batch size: 256 | lm loss: 2.202474E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.646 | TFLOPs: 31.20 | 7: iteration 94360/ 115203 | consumed samples: 24156160 | consumed tokens: 49471815680 | elapsed time per iteration (s): 0.43 | learning rate: 3.443E-05 | global batch size: 256 | lm loss: 2.212650E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.826 | TFLOPs: 31.52 | 7: iteration 94370/ 115203 | consumed samples: 24158720 | consumed tokens: 49477058560 | elapsed time per iteration (s): 0.43 | learning rate: 3.442E-05 | global batch size: 256 | lm loss: 2.239333E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.987 | TFLOPs: 31.17 | 7: iteration 94380/ 115203 | consumed samples: 24161280 | consumed tokens: 49482301440 | elapsed time per iteration (s): 0.43 | learning rate: 3.440E-05 | global batch size: 256 | lm loss: 2.227924E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.285 | TFLOPs: 31.60 | 7: iteration 94390/ 115203 | consumed samples: 24163840 | consumed tokens: 49487544320 | elapsed time per iteration (s): 0.43 | learning rate: 3.439E-05 | global batch size: 256 | lm loss: 2.245535E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.669 | TFLOPs: 30.99 | 7: iteration 94400/ 115203 | consumed samples: 24166400 | consumed tokens: 49492787200 | elapsed time per iteration (s): 0.44 | learning rate: 3.438E-05 | global batch size: 256 | lm loss: 2.218589E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.743 | TFLOPs: 30.79 | 7: iteration 94410/ 115203 | consumed samples: 24168960 | consumed tokens: 49498030080 | elapsed time per iteration (s): 0.44 | learning rate: 3.436E-05 | global batch size: 256 | lm loss: 2.217073E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.658 | TFLOPs: 30.57 | 7: iteration 94420/ 115203 | consumed samples: 24171520 | consumed tokens: 49503272960 | elapsed time per iteration (s): 0.44 | learning rate: 3.435E-05 | global batch size: 256 | lm loss: 2.252330E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.915 | TFLOPs: 30.69 | 7: iteration 94430/ 115203 | consumed samples: 24174080 | consumed tokens: 49508515840 | elapsed time per iteration (s): 0.43 | learning rate: 3.434E-05 | global batch size: 256 | lm loss: 2.230546E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.965 | TFLOPs: 31.32 | 7: iteration 94440/ 115203 | consumed samples: 24176640 | consumed tokens: 49513758720 | elapsed time per iteration (s): 0.43 | learning rate: 3.432E-05 | global batch size: 256 | lm loss: 2.224239E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.974 | TFLOPs: 31.11 | 7: iteration 94450/ 115203 | consumed samples: 24179200 | consumed tokens: 49519001600 | elapsed time per iteration (s): 0.43 | learning rate: 3.431E-05 | global batch size: 256 | lm loss: 2.234649E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.271 | TFLOPs: 31.29 | 7: iteration 94460/ 115203 | consumed samples: 24181760 | consumed tokens: 49524244480 | elapsed time per iteration (s): 0.43 | learning rate: 3.430E-05 | global batch size: 256 | lm loss: 2.283200E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.000 | TFLOPs: 31.06 | 7: iteration 94470/ 115203 | consumed samples: 24184320 | consumed tokens: 49529487360 | elapsed time per iteration (s): 0.43 | learning rate: 3.428E-05 | global batch size: 256 | lm loss: 2.239106E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.574 | TFLOPs: 31.14 | 7: iteration 94480/ 115203 | consumed samples: 24186880 | consumed tokens: 49534730240 | elapsed time per iteration (s): 0.43 | learning rate: 3.427E-05 | global batch size: 256 | lm loss: 2.210793E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.297 | TFLOPs: 31.13 | 7: iteration 94490/ 115203 | consumed samples: 24189440 | consumed tokens: 49539973120 | elapsed time per iteration (s): 0.43 | learning rate: 3.426E-05 | global batch size: 256 | lm loss: 2.238239E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.904 | TFLOPs: 31.53 | 7: iteration 94500/ 115203 | consumed samples: 24192000 | consumed tokens: 49545216000 | elapsed time per iteration (s): 0.43 | learning rate: 3.424E-05 | global batch size: 256 | lm loss: 2.213033E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.725 | TFLOPs: 31.47 | 7: iteration 94510/ 115203 | consumed samples: 24194560 | consumed tokens: 49550458880 | elapsed time per iteration (s): 0.42 | learning rate: 3.423E-05 | global batch size: 256 | lm loss: 2.208116E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.153 | TFLOPs: 31.91 | 7: iteration 94520/ 115203 | consumed samples: 24197120 | consumed tokens: 49555701760 | elapsed time per iteration (s): 0.43 | learning rate: 3.422E-05 | global batch size: 256 | lm loss: 2.192064E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.388 | TFLOPs: 31.19 | 7: iteration 94530/ 115203 | consumed samples: 24199680 | consumed tokens: 49560944640 | elapsed time per iteration (s): 0.44 | learning rate: 3.420E-05 | global batch size: 256 | lm loss: 2.266280E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.618 | TFLOPs: 30.62 | 7: iteration 94540/ 115203 | consumed samples: 24202240 | consumed tokens: 49566187520 | elapsed time per iteration (s): 0.43 | learning rate: 3.419E-05 | global batch size: 256 | lm loss: 2.206330E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.467 | TFLOPs: 31.24 | 7: iteration 94550/ 115203 | consumed samples: 24204800 | consumed tokens: 49571430400 | elapsed time per iteration (s): 0.44 | learning rate: 3.418E-05 | global batch size: 256 | lm loss: 2.244191E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.575 | TFLOPs: 30.57 | 7: iteration 94560/ 115203 | consumed samples: 24207360 | consumed tokens: 49576673280 | elapsed time per iteration (s): 0.43 | learning rate: 3.416E-05 | global batch size: 256 | lm loss: 2.206444E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.834 | TFLOPs: 31.31 | 7: iteration 94570/ 115203 | consumed samples: 24209920 | consumed tokens: 49581916160 | elapsed time per iteration (s): 0.43 | learning rate: 3.415E-05 | global batch size: 256 | lm loss: 2.225317E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.318 | TFLOPs: 31.08 | 7: iteration 94580/ 115203 | consumed samples: 24212480 | consumed tokens: 49587159040 | elapsed time per iteration (s): 0.44 | learning rate: 3.414E-05 | global batch size: 256 | lm loss: 2.233039E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.359 | TFLOPs: 30.71 | 7: iteration 94590/ 115203 | consumed samples: 24215040 | consumed tokens: 49592401920 | elapsed time per iteration (s): 0.44 | learning rate: 3.412E-05 | global batch size: 256 | lm loss: 2.199010E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.825 | TFLOPs: 30.68 | 7: iteration 94600/ 115203 | consumed samples: 24217600 | consumed tokens: 49597644800 | elapsed time per iteration (s): 0.44 | learning rate: 3.411E-05 | global batch size: 256 | lm loss: 2.242571E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.590 | TFLOPs: 30.83 | 7: iteration 94610/ 115203 | consumed samples: 24220160 | consumed tokens: 49602887680 | elapsed time per iteration (s): 0.43 | learning rate: 3.410E-05 | global batch size: 256 | lm loss: 2.217638E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.783 | TFLOPs: 31.36 | 7: iteration 94620/ 115203 | consumed samples: 24222720 | consumed tokens: 49608130560 | elapsed time per iteration (s): 0.44 | learning rate: 3.408E-05 | global batch size: 256 | lm loss: 2.256136E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.768 | TFLOPs: 30.73 | 7: iteration 94630/ 115203 | consumed samples: 24225280 | consumed tokens: 49613373440 | elapsed time per iteration (s): 0.44 | learning rate: 3.407E-05 | global batch size: 256 | lm loss: 2.234878E+00 | grad norm: 0.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.752 | TFLOPs: 30.73 | 7: iteration 94640/ 115203 | consumed samples: 24227840 | consumed tokens: 49618616320 | elapsed time per iteration (s): 0.43 | learning rate: 3.406E-05 | global batch size: 256 | lm loss: 2.225951E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.856 | TFLOPs: 31.26 | 7: iteration 94650/ 115203 | consumed samples: 24230400 | consumed tokens: 49623859200 | elapsed time per iteration (s): 0.43 | learning rate: 3.404E-05 | global batch size: 256 | lm loss: 2.222440E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.174 | TFLOPs: 31.49 | 7: iteration 94660/ 115203 | consumed samples: 24232960 | consumed tokens: 49629102080 | elapsed time per iteration (s): 0.43 | learning rate: 3.403E-05 | global batch size: 256 | lm loss: 2.224046E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.842 | TFLOPs: 31.37 | 7: iteration 94670/ 115203 | consumed samples: 24235520 | consumed tokens: 49634344960 | elapsed time per iteration (s): 0.43 | learning rate: 3.402E-05 | global batch size: 256 | lm loss: 2.219477E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.194 | TFLOPs: 31.12 | 7: iteration 94680/ 115203 | consumed samples: 24238080 | consumed tokens: 49639587840 | elapsed time per iteration (s): 0.42 | learning rate: 3.400E-05 | global batch size: 256 | lm loss: 2.176889E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.737 | TFLOPs: 31.78 | 7: iteration 94690/ 115203 | consumed samples: 24240640 | consumed tokens: 49644830720 | elapsed time per iteration (s): 0.43 | learning rate: 3.399E-05 | global batch size: 256 | lm loss: 2.213803E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.526 | TFLOPs: 31.46 | 7: iteration 94700/ 115203 | consumed samples: 24243200 | consumed tokens: 49650073600 | elapsed time per iteration (s): 0.44 | learning rate: 3.398E-05 | global batch size: 256 | lm loss: 2.203866E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.199 | TFLOPs: 30.81 | 7: iteration 94710/ 115203 | consumed samples: 24245760 | consumed tokens: 49655316480 | elapsed time per iteration (s): 0.43 | learning rate: 3.396E-05 | global batch size: 256 | lm loss: 2.202622E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.572 | TFLOPs: 30.88 | 7: iteration 94720/ 115203 | consumed samples: 24248320 | consumed tokens: 49660559360 | elapsed time per iteration (s): 0.43 | learning rate: 3.395E-05 | global batch size: 256 | lm loss: 2.260299E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.339 | TFLOPs: 31.13 | 7: iteration 94730/ 115203 | consumed samples: 24250880 | consumed tokens: 49665802240 | elapsed time per iteration (s): 0.44 | learning rate: 3.394E-05 | global batch size: 256 | lm loss: 2.202673E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.650 | TFLOPs: 30.57 | 7: iteration 94740/ 115203 | consumed samples: 24253440 | consumed tokens: 49671045120 | elapsed time per iteration (s): 0.44 | learning rate: 3.392E-05 | global batch size: 256 | lm loss: 2.221139E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.851 | TFLOPs: 30.37 | 7: iteration 94750/ 115203 | consumed samples: 24256000 | consumed tokens: 49676288000 | elapsed time per iteration (s): 0.44 | learning rate: 3.391E-05 | global batch size: 256 | lm loss: 2.224535E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.945 | TFLOPs: 30.59 | 7: iteration 94760/ 115203 | consumed samples: 24258560 | consumed tokens: 49681530880 | elapsed time per iteration (s): 0.45 | learning rate: 3.390E-05 | global batch size: 256 | lm loss: 2.248570E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.224 | TFLOPs: 30.18 | 7: iteration 94770/ 115203 | consumed samples: 24261120 | consumed tokens: 49686773760 | elapsed time per iteration (s): 0.43 | learning rate: 3.388E-05 | global batch size: 256 | lm loss: 2.229933E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.254 | TFLOPs: 30.97 | 7: iteration 94780/ 115203 | consumed samples: 24263680 | consumed tokens: 49692016640 | elapsed time per iteration (s): 0.44 | learning rate: 3.387E-05 | global batch size: 256 | lm loss: 2.229520E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.519 | TFLOPs: 30.46 | 7: iteration 94790/ 115203 | consumed samples: 24266240 | consumed tokens: 49697259520 | elapsed time per iteration (s): 0.43 | learning rate: 3.386E-05 | global batch size: 256 | lm loss: 2.244807E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.727 | TFLOPs: 31.52 | 7: iteration 94800/ 115203 | consumed samples: 24268800 | consumed tokens: 49702502400 | elapsed time per iteration (s): 0.43 | learning rate: 3.384E-05 | global batch size: 256 | lm loss: 2.221468E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.951 | TFLOPs: 31.27 | 7: iteration 94810/ 115203 | consumed samples: 24271360 | consumed tokens: 49707745280 | elapsed time per iteration (s): 0.42 | learning rate: 3.383E-05 | global batch size: 256 | lm loss: 2.230234E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.455 | TFLOPs: 31.66 | 7: iteration 94820/ 115203 | consumed samples: 24273920 | consumed tokens: 49712988160 | elapsed time per iteration (s): 0.43 | learning rate: 3.382E-05 | global batch size: 256 | lm loss: 2.237869E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.728 | TFLOPs: 31.31 | 7: iteration 94830/ 115203 | consumed samples: 24276480 | consumed tokens: 49718231040 | elapsed time per iteration (s): 0.44 | learning rate: 3.380E-05 | global batch size: 256 | lm loss: 2.220042E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.137 | TFLOPs: 30.86 | 7: iteration 94840/ 115203 | consumed samples: 24279040 | consumed tokens: 49723473920 | elapsed time per iteration (s): 0.43 | learning rate: 3.379E-05 | global batch size: 256 | lm loss: 2.216083E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.454 | TFLOPs: 31.19 | 7: iteration 94850/ 115203 | consumed samples: 24281600 | consumed tokens: 49728716800 | elapsed time per iteration (s): 0.44 | learning rate: 3.378E-05 | global batch size: 256 | lm loss: 2.210226E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.273 | TFLOPs: 30.87 | 7: iteration 94860/ 115203 | consumed samples: 24284160 | consumed tokens: 49733959680 | elapsed time per iteration (s): 0.44 | learning rate: 3.377E-05 | global batch size: 256 | lm loss: 2.217215E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.590 | TFLOPs: 30.83 | 7: iteration 94870/ 115203 | consumed samples: 24286720 | consumed tokens: 49739202560 | elapsed time per iteration (s): 0.43 | learning rate: 3.375E-05 | global batch size: 256 | lm loss: 2.218663E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.094 | TFLOPs: 31.33 | 7: iteration 94880/ 115203 | consumed samples: 24289280 | consumed tokens: 49744445440 | elapsed time per iteration (s): 0.43 | learning rate: 3.374E-05 | global batch size: 256 | lm loss: 2.255140E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.935 | TFLOPs: 31.43 | 7: iteration 94890/ 115203 | consumed samples: 24291840 | consumed tokens: 49749688320 | elapsed time per iteration (s): 0.44 | learning rate: 3.373E-05 | global batch size: 256 | lm loss: 2.275573E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.075 | TFLOPs: 30.28 | 7: iteration 94900/ 115203 | consumed samples: 24294400 | consumed tokens: 49754931200 | elapsed time per iteration (s): 0.43 | learning rate: 3.371E-05 | global batch size: 256 | lm loss: 2.252017E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.319 | TFLOPs: 30.97 | 7: iteration 94910/ 115203 | consumed samples: 24296960 | consumed tokens: 49760174080 | elapsed time per iteration (s): 0.43 | learning rate: 3.370E-05 | global batch size: 256 | lm loss: 2.231317E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.237 | TFLOPs: 31.60 | 7: iteration 94920/ 115203 | consumed samples: 24299520 | consumed tokens: 49765416960 | elapsed time per iteration (s): 0.43 | learning rate: 3.369E-05 | global batch size: 256 | lm loss: 2.246658E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.437 | TFLOPs: 31.35 | 7: iteration 94930/ 115203 | consumed samples: 24302080 | consumed tokens: 49770659840 | elapsed time per iteration (s): 0.43 | learning rate: 3.367E-05 | global batch size: 256 | lm loss: 2.247982E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.161 | TFLOPs: 31.12 | 7: iteration 94940/ 115203 | consumed samples: 24304640 | consumed tokens: 49775902720 | elapsed time per iteration (s): 0.48 | learning rate: 3.366E-05 | global batch size: 256 | lm loss: 2.217604E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 530.063 | TFLOPs: 27.81 | 7: iteration 94950/ 115203 | consumed samples: 24307200 | consumed tokens: 49781145600 | elapsed time per iteration (s): 0.44 | learning rate: 3.365E-05 | global batch size: 256 | lm loss: 2.241949E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.321 | TFLOPs: 30.40 | 7: iteration 94960/ 115203 | consumed samples: 24309760 | consumed tokens: 49786388480 | elapsed time per iteration (s): 0.42 | learning rate: 3.363E-05 | global batch size: 256 | lm loss: 2.243971E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.701 | TFLOPs: 31.73 | 7: iteration 94970/ 115203 | consumed samples: 24312320 | consumed tokens: 49791631360 | elapsed time per iteration (s): 0.43 | learning rate: 3.362E-05 | global batch size: 256 | lm loss: 2.224541E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.875 | TFLOPs: 31.05 | 7: iteration 94980/ 115203 | consumed samples: 24314880 | consumed tokens: 49796874240 | elapsed time per iteration (s): 0.44 | learning rate: 3.361E-05 | global batch size: 256 | lm loss: 2.215870E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.606 | TFLOPs: 30.83 | 7: iteration 94990/ 115203 | consumed samples: 24317440 | consumed tokens: 49802117120 | elapsed time per iteration (s): 0.43 | learning rate: 3.359E-05 | global batch size: 256 | lm loss: 2.216827E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.970 | TFLOPs: 31.37 | 7: iteration 95000/ 115203 | consumed samples: 24320000 | consumed tokens: 49807360000 | elapsed time per iteration (s): 0.43 | learning rate: 3.358E-05 | global batch size: 256 | lm loss: 2.229644E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.994 | TFLOPs: 31.11 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 95000 | lm loss value: 2.108446E+00 | lm loss PPL: 8.235431E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 95000 to checkpoints_221m 0: [2022-11-29 00:24:33,012] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step95000 is begin to save! 0: [2022-11-29 00:24:33,020] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:24:33,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:24:33,150] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:24:33,176] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:24:33,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:24:33,199] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:24:33,199] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:24:33,222] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:24:33,223] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:24:33,247] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:24:33,247] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:24:33,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:24:33,272] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:24:33,294] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:24:33,294] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:24:33,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:24:33,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:24:33,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:24:33,344] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:24:33,368] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:24:33,369] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:24:33,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:24:33,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:24:33,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:24:33,416] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:24:33,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:24:33,441] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:24:33,466] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:24:33,466] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:24:33,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:24:33,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:24:33,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:24:33,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:24:33,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:24:33,538] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:24:33,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:24:33,562] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:24:33,586] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:24:33,586] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:24:33,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:24:33,591] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step95000/mp_rank_00_model_states.pt 0: [2022-11-29 00:24:33,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:24:33,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:24:33,611] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:24:33,665] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,666] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,666] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,667] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,667] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,667] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,668] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,668] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,670] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,670] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,670] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,671] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,671] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,671] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,672] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,673] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,673] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,674] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,674] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,676] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,676] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,676] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 2: [2022-11-29 00:24:33,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:24:33,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 00:24:33,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,677] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,677] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,677] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,675] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,675] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,678] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,678] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,678] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,679] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,679] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,680] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,680] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,680] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,681] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,681] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,681] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,682] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,682] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,682] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,683] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,683] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,686] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,687] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,687] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 7: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:24:33,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,688] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,688] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,689] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:24:33,689] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 4: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,692] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,692] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,692] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,690] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 5: [2022-11-29 00:24:33,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:24:33,691] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:24:33,691] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 1: [2022-11-29 00:24:33,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:24:33,695] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:24:33,695] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:24:33,697] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,697] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: [2022-11-29 00:24:33,703] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:24:33,703] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:24:33,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,707] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 6: [2022-11-29 00:24:33,707] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 3: [2022-11-29 00:24:33,787] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:24:33,788] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step95000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:24:33,788] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step95000 is ready now! 0: successfully saved checkpoint at iteration 95000 to checkpoints_221m 7: time (ms) | save-checkpoint: 797.97 7: iteration 95010/ 115203 | consumed samples: 24322560 | consumed tokens: 49812602880 | elapsed time per iteration (s): 0.53 | learning rate: 3.357E-05 | global batch size: 256 | lm loss: 2.233144E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 483.141 | TFLOPs: 25.35 | 7: iteration 95020/ 115203 | consumed samples: 24325120 | consumed tokens: 49817845760 | elapsed time per iteration (s): 0.47 | learning rate: 3.356E-05 | global batch size: 256 | lm loss: 2.215175E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 539.594 | TFLOPs: 28.31 | 7: iteration 95030/ 115203 | consumed samples: 24327680 | consumed tokens: 49823088640 | elapsed time per iteration (s): 0.43 | learning rate: 3.354E-05 | global batch size: 256 | lm loss: 2.241215E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.389 | TFLOPs: 31.24 | 7: iteration 95040/ 115203 | consumed samples: 24330240 | consumed tokens: 49828331520 | elapsed time per iteration (s): 0.42 | learning rate: 3.353E-05 | global batch size: 256 | lm loss: 2.251921E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.551 | TFLOPs: 31.82 | 7: iteration 95050/ 115203 | consumed samples: 24332800 | consumed tokens: 49833574400 | elapsed time per iteration (s): 0.43 | learning rate: 3.352E-05 | global batch size: 256 | lm loss: 2.209064E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.548 | TFLOPs: 31.25 | 7: iteration 95060/ 115203 | consumed samples: 24335360 | consumed tokens: 49838817280 | elapsed time per iteration (s): 0.43 | learning rate: 3.350E-05 | global batch size: 256 | lm loss: 2.187604E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.345 | TFLOPs: 31.60 | 7: iteration 95070/ 115203 | consumed samples: 24337920 | consumed tokens: 49844060160 | elapsed time per iteration (s): 0.43 | learning rate: 3.349E-05 | global batch size: 256 | lm loss: 2.208418E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.028 | TFLOPs: 31.22 | 7: iteration 95080/ 115203 | consumed samples: 24340480 | consumed tokens: 49849303040 | elapsed time per iteration (s): 0.43 | learning rate: 3.348E-05 | global batch size: 256 | lm loss: 2.227366E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.057 | TFLOPs: 31.22 | 7: iteration 95090/ 115203 | consumed samples: 24343040 | consumed tokens: 49854545920 | elapsed time per iteration (s): 0.43 | learning rate: 3.346E-05 | global batch size: 256 | lm loss: 2.222858E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.295 | TFLOPs: 31.08 | 7: iteration 95100/ 115203 | consumed samples: 24345600 | consumed tokens: 49859788800 | elapsed time per iteration (s): 0.43 | learning rate: 3.345E-05 | global batch size: 256 | lm loss: 2.258954E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.368 | TFLOPs: 31.13 | 7: iteration 95110/ 115203 | consumed samples: 24348160 | consumed tokens: 49865031680 | elapsed time per iteration (s): 0.45 | learning rate: 3.344E-05 | global batch size: 256 | lm loss: 2.261415E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.867 | TFLOPs: 30.11 | 7: iteration 95120/ 115203 | consumed samples: 24350720 | consumed tokens: 49870274560 | elapsed time per iteration (s): 0.43 | learning rate: 3.342E-05 | global batch size: 256 | lm loss: 2.257224E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.751 | TFLOPs: 31.47 | 7: iteration 95130/ 115203 | consumed samples: 24353280 | consumed tokens: 49875517440 | elapsed time per iteration (s): 0.43 | learning rate: 3.341E-05 | global batch size: 256 | lm loss: 2.244746E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.676 | TFLOPs: 31.31 | 7: iteration 95140/ 115203 | consumed samples: 24355840 | consumed tokens: 49880760320 | elapsed time per iteration (s): 0.43 | learning rate: 3.340E-05 | global batch size: 256 | lm loss: 2.222569E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.282 | TFLOPs: 31.23 | 7: iteration 95150/ 115203 | consumed samples: 24358400 | consumed tokens: 49886003200 | elapsed time per iteration (s): 0.43 | learning rate: 3.339E-05 | global batch size: 256 | lm loss: 2.215710E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.403 | TFLOPs: 31.13 | 7: iteration 95160/ 115203 | consumed samples: 24360960 | consumed tokens: 49891246080 | elapsed time per iteration (s): 0.42 | learning rate: 3.337E-05 | global batch size: 256 | lm loss: 2.221403E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.731 | TFLOPs: 31.83 | 7: iteration 95170/ 115203 | consumed samples: 24363520 | consumed tokens: 49896488960 | elapsed time per iteration (s): 0.44 | learning rate: 3.336E-05 | global batch size: 256 | lm loss: 2.242614E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.734 | TFLOPs: 30.73 | 7: iteration 95180/ 115203 | consumed samples: 24366080 | consumed tokens: 49901731840 | elapsed time per iteration (s): 0.44 | learning rate: 3.335E-05 | global batch size: 256 | lm loss: 2.227316E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.232 | TFLOPs: 30.44 | 7: iteration 95190/ 115203 | consumed samples: 24368640 | consumed tokens: 49906974720 | elapsed time per iteration (s): 0.43 | learning rate: 3.333E-05 | global batch size: 256 | lm loss: 2.241735E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.432 | TFLOPs: 31.35 | 7: iteration 95200/ 115203 | consumed samples: 24371200 | consumed tokens: 49912217600 | elapsed time per iteration (s): 0.43 | learning rate: 3.332E-05 | global batch size: 256 | lm loss: 2.217259E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.269 | TFLOPs: 31.18 | 7: iteration 95210/ 115203 | consumed samples: 24373760 | consumed tokens: 49917460480 | elapsed time per iteration (s): 0.42 | learning rate: 3.331E-05 | global batch size: 256 | lm loss: 2.239403E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.810 | TFLOPs: 31.84 | 7: iteration 95220/ 115203 | consumed samples: 24376320 | consumed tokens: 49922703360 | elapsed time per iteration (s): 0.43 | learning rate: 3.329E-05 | global batch size: 256 | lm loss: 2.239010E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.674 | TFLOPs: 30.94 | 7: iteration 95230/ 115203 | consumed samples: 24378880 | consumed tokens: 49927946240 | elapsed time per iteration (s): 0.42 | learning rate: 3.328E-05 | global batch size: 256 | lm loss: 2.211238E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.639 | TFLOPs: 31.93 | 7: iteration 95240/ 115203 | consumed samples: 24381440 | consumed tokens: 49933189120 | elapsed time per iteration (s): 0.43 | learning rate: 3.327E-05 | global batch size: 256 | lm loss: 2.229400E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.060 | TFLOPs: 30.96 | 7: iteration 95250/ 115203 | consumed samples: 24384000 | consumed tokens: 49938432000 | elapsed time per iteration (s): 0.43 | learning rate: 3.326E-05 | global batch size: 256 | lm loss: 2.234686E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.945 | TFLOPs: 31.11 | 7: iteration 95260/ 115203 | consumed samples: 24386560 | consumed tokens: 49943674880 | elapsed time per iteration (s): 0.44 | learning rate: 3.324E-05 | global batch size: 256 | lm loss: 2.214948E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.304 | TFLOPs: 30.19 | 7: iteration 95270/ 115203 | consumed samples: 24389120 | consumed tokens: 49948917760 | elapsed time per iteration (s): 0.45 | learning rate: 3.323E-05 | global batch size: 256 | lm loss: 2.203470E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.204 | TFLOPs: 29.60 | 7: iteration 95280/ 115203 | consumed samples: 24391680 | consumed tokens: 49954160640 | elapsed time per iteration (s): 0.44 | learning rate: 3.322E-05 | global batch size: 256 | lm loss: 2.256606E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.388 | TFLOPs: 30.66 | 7: iteration 95290/ 115203 | consumed samples: 24394240 | consumed tokens: 49959403520 | elapsed time per iteration (s): 0.44 | learning rate: 3.320E-05 | global batch size: 256 | lm loss: 2.195088E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.209 | TFLOPs: 30.55 | 7: iteration 95300/ 115203 | consumed samples: 24396800 | consumed tokens: 49964646400 | elapsed time per iteration (s): 0.43 | learning rate: 3.319E-05 | global batch size: 256 | lm loss: 2.246600E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.692 | TFLOPs: 31.26 | 7: iteration 95310/ 115203 | consumed samples: 24399360 | consumed tokens: 49969889280 | elapsed time per iteration (s): 0.43 | learning rate: 3.318E-05 | global batch size: 256 | lm loss: 2.216388E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.959 | TFLOPs: 31.53 | 7: iteration 95320/ 115203 | consumed samples: 24401920 | consumed tokens: 49975132160 | elapsed time per iteration (s): 0.43 | learning rate: 3.317E-05 | global batch size: 256 | lm loss: 2.207426E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.014 | TFLOPs: 31.53 | 7: iteration 95330/ 115203 | consumed samples: 24404480 | consumed tokens: 49980375040 | elapsed time per iteration (s): 0.43 | learning rate: 3.315E-05 | global batch size: 256 | lm loss: 2.237234E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.035 | TFLOPs: 31.22 | 7: iteration 95340/ 115203 | consumed samples: 24407040 | consumed tokens: 49985617920 | elapsed time per iteration (s): 0.43 | learning rate: 3.314E-05 | global batch size: 256 | lm loss: 2.247136E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.068 | TFLOPs: 31.01 | 7: iteration 95350/ 115203 | consumed samples: 24409600 | consumed tokens: 49990860800 | elapsed time per iteration (s): 0.43 | learning rate: 3.313E-05 | global batch size: 256 | lm loss: 2.196636E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.992 | TFLOPs: 31.01 | 7: iteration 95360/ 115203 | consumed samples: 24412160 | consumed tokens: 49996103680 | elapsed time per iteration (s): 0.43 | learning rate: 3.311E-05 | global batch size: 256 | lm loss: 2.187312E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.620 | TFLOPs: 31.51 | 7: iteration 95370/ 115203 | consumed samples: 24414720 | consumed tokens: 50001346560 | elapsed time per iteration (s): 0.44 | learning rate: 3.310E-05 | global batch size: 256 | lm loss: 2.233142E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.693 | TFLOPs: 30.31 | 7: iteration 95380/ 115203 | consumed samples: 24417280 | consumed tokens: 50006589440 | elapsed time per iteration (s): 0.44 | learning rate: 3.309E-05 | global batch size: 256 | lm loss: 2.218104E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.321 | TFLOPs: 30.61 | 7: iteration 95390/ 115203 | consumed samples: 24419840 | consumed tokens: 50011832320 | elapsed time per iteration (s): 0.43 | learning rate: 3.307E-05 | global batch size: 256 | lm loss: 2.244225E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.731 | TFLOPs: 31.15 | 7: iteration 95400/ 115203 | consumed samples: 24422400 | consumed tokens: 50017075200 | elapsed time per iteration (s): 0.43 | learning rate: 3.306E-05 | global batch size: 256 | lm loss: 2.216846E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.744 | TFLOPs: 31.05 | 7: iteration 95410/ 115203 | consumed samples: 24424960 | consumed tokens: 50022318080 | elapsed time per iteration (s): 0.44 | learning rate: 3.305E-05 | global batch size: 256 | lm loss: 2.200220E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.310 | TFLOPs: 30.66 | 7: iteration 95420/ 115203 | consumed samples: 24427520 | consumed tokens: 50027560960 | elapsed time per iteration (s): 0.44 | learning rate: 3.304E-05 | global batch size: 256 | lm loss: 2.221163E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.481 | TFLOPs: 30.82 | 7: iteration 95430/ 115203 | consumed samples: 24430080 | consumed tokens: 50032803840 | elapsed time per iteration (s): 0.42 | learning rate: 3.302E-05 | global batch size: 256 | lm loss: 2.199128E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.698 | TFLOPs: 31.62 | 7: iteration 95440/ 115203 | consumed samples: 24432640 | consumed tokens: 50038046720 | elapsed time per iteration (s): 0.43 | learning rate: 3.301E-05 | global batch size: 256 | lm loss: 2.238345E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.478 | TFLOPs: 31.30 | 7: iteration 95450/ 115203 | consumed samples: 24435200 | consumed tokens: 50043289600 | elapsed time per iteration (s): 0.42 | learning rate: 3.300E-05 | global batch size: 256 | lm loss: 2.222552E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.610 | TFLOPs: 31.83 | 7: iteration 95460/ 115203 | consumed samples: 24437760 | consumed tokens: 50048532480 | elapsed time per iteration (s): 0.44 | learning rate: 3.298E-05 | global batch size: 256 | lm loss: 2.247487E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.589 | TFLOPs: 30.20 | 7: iteration 95470/ 115203 | consumed samples: 24440320 | consumed tokens: 50053775360 | elapsed time per iteration (s): 0.43 | learning rate: 3.297E-05 | global batch size: 256 | lm loss: 2.247437E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.360 | TFLOPs: 30.92 | 7: iteration 95480/ 115203 | consumed samples: 24442880 | consumed tokens: 50059018240 | elapsed time per iteration (s): 0.43 | learning rate: 3.296E-05 | global batch size: 256 | lm loss: 2.215572E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.041 | TFLOPs: 30.91 | 7: iteration 95490/ 115203 | consumed samples: 24445440 | consumed tokens: 50064261120 | elapsed time per iteration (s): 0.42 | learning rate: 3.295E-05 | global batch size: 256 | lm loss: 2.211113E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.860 | TFLOPs: 31.79 | 7: iteration 95500/ 115203 | consumed samples: 24448000 | consumed tokens: 50069504000 | elapsed time per iteration (s): 0.44 | learning rate: 3.293E-05 | global batch size: 256 | lm loss: 2.249934E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.582 | TFLOPs: 30.46 | 7: iteration 95510/ 115203 | consumed samples: 24450560 | consumed tokens: 50074746880 | elapsed time per iteration (s): 0.43 | learning rate: 3.292E-05 | global batch size: 256 | lm loss: 2.236520E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.421 | TFLOPs: 31.35 | 7: iteration 95520/ 115203 | consumed samples: 24453120 | consumed tokens: 50079989760 | elapsed time per iteration (s): 0.43 | learning rate: 3.291E-05 | global batch size: 256 | lm loss: 2.268287E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.123 | TFLOPs: 31.23 | 7: iteration 95530/ 115203 | consumed samples: 24455680 | consumed tokens: 50085232640 | elapsed time per iteration (s): 0.43 | learning rate: 3.290E-05 | global batch size: 256 | lm loss: 2.250847E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.862 | TFLOPs: 31.16 | 7: iteration 95540/ 115203 | consumed samples: 24458240 | consumed tokens: 50090475520 | elapsed time per iteration (s): 0.44 | learning rate: 3.288E-05 | global batch size: 256 | lm loss: 2.210138E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.360 | TFLOPs: 30.82 | 7: iteration 95550/ 115203 | consumed samples: 24460800 | consumed tokens: 50095718400 | elapsed time per iteration (s): 0.43 | learning rate: 3.287E-05 | global batch size: 256 | lm loss: 2.212849E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.192 | TFLOPs: 31.33 | 7: iteration 95560/ 115203 | consumed samples: 24463360 | consumed tokens: 50100961280 | elapsed time per iteration (s): 0.43 | learning rate: 3.286E-05 | global batch size: 256 | lm loss: 2.228405E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.609 | TFLOPs: 31.04 | 7: iteration 95570/ 115203 | consumed samples: 24465920 | consumed tokens: 50106204160 | elapsed time per iteration (s): 0.44 | learning rate: 3.284E-05 | global batch size: 256 | lm loss: 2.232143E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.674 | TFLOPs: 30.36 | 7: iteration 95580/ 115203 | consumed samples: 24468480 | consumed tokens: 50111447040 | elapsed time per iteration (s): 0.43 | learning rate: 3.283E-05 | global batch size: 256 | lm loss: 2.212247E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.929 | TFLOPs: 31.32 | 7: iteration 95590/ 115203 | consumed samples: 24471040 | consumed tokens: 50116689920 | elapsed time per iteration (s): 0.44 | learning rate: 3.282E-05 | global batch size: 256 | lm loss: 2.222664E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.925 | TFLOPs: 30.79 | 7: iteration 95600/ 115203 | consumed samples: 24473600 | consumed tokens: 50121932800 | elapsed time per iteration (s): 0.43 | learning rate: 3.281E-05 | global batch size: 256 | lm loss: 2.201526E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.114 | TFLOPs: 31.12 | 7: iteration 95610/ 115203 | consumed samples: 24476160 | consumed tokens: 50127175680 | elapsed time per iteration (s): 0.42 | learning rate: 3.279E-05 | global batch size: 256 | lm loss: 2.220130E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.461 | TFLOPs: 31.72 | 7: iteration 95620/ 115203 | consumed samples: 24478720 | consumed tokens: 50132418560 | elapsed time per iteration (s): 0.43 | learning rate: 3.278E-05 | global batch size: 256 | lm loss: 2.228282E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.816 | TFLOPs: 31.58 | 7: iteration 95630/ 115203 | consumed samples: 24481280 | consumed tokens: 50137661440 | elapsed time per iteration (s): 0.45 | learning rate: 3.277E-05 | global batch size: 256 | lm loss: 2.236117E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.389 | TFLOPs: 29.93 | 7: iteration 95640/ 115203 | consumed samples: 24483840 | consumed tokens: 50142904320 | elapsed time per iteration (s): 0.43 | learning rate: 3.276E-05 | global batch size: 256 | lm loss: 2.231908E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.339 | TFLOPs: 31.13 | 7: iteration 95650/ 115203 | consumed samples: 24486400 | consumed tokens: 50148147200 | elapsed time per iteration (s): 0.43 | learning rate: 3.274E-05 | global batch size: 256 | lm loss: 2.238091E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.212 | TFLOPs: 30.91 | 7: iteration 95660/ 115203 | consumed samples: 24488960 | consumed tokens: 50153390080 | elapsed time per iteration (s): 0.43 | learning rate: 3.273E-05 | global batch size: 256 | lm loss: 2.195487E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.972 | TFLOPs: 31.06 | 7: iteration 95670/ 115203 | consumed samples: 24491520 | consumed tokens: 50158632960 | elapsed time per iteration (s): 0.43 | learning rate: 3.272E-05 | global batch size: 256 | lm loss: 2.204515E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.954 | TFLOPs: 31.58 | 7: iteration 95680/ 115203 | consumed samples: 24494080 | consumed tokens: 50163875840 | elapsed time per iteration (s): 0.43 | learning rate: 3.270E-05 | global batch size: 256 | lm loss: 2.242380E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.948 | TFLOPs: 30.90 | 7: iteration 95690/ 115203 | consumed samples: 24496640 | consumed tokens: 50169118720 | elapsed time per iteration (s): 0.43 | learning rate: 3.269E-05 | global batch size: 256 | lm loss: 2.222521E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.992 | TFLOPs: 31.17 | 7: iteration 95700/ 115203 | consumed samples: 24499200 | consumed tokens: 50174361600 | elapsed time per iteration (s): 0.44 | learning rate: 3.268E-05 | global batch size: 256 | lm loss: 2.265393E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.166 | TFLOPs: 30.81 | 7: iteration 95710/ 115203 | consumed samples: 24501760 | consumed tokens: 50179604480 | elapsed time per iteration (s): 0.44 | learning rate: 3.267E-05 | global batch size: 256 | lm loss: 2.225883E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.531 | TFLOPs: 30.46 | 7: iteration 95720/ 115203 | consumed samples: 24504320 | consumed tokens: 50184847360 | elapsed time per iteration (s): 0.43 | learning rate: 3.265E-05 | global batch size: 256 | lm loss: 2.223896E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.264 | TFLOPs: 31.34 | 7: iteration 95730/ 115203 | consumed samples: 24506880 | consumed tokens: 50190090240 | elapsed time per iteration (s): 0.43 | learning rate: 3.264E-05 | global batch size: 256 | lm loss: 2.192060E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.919 | TFLOPs: 31.42 | 7: iteration 95740/ 115203 | consumed samples: 24509440 | consumed tokens: 50195333120 | elapsed time per iteration (s): 0.45 | learning rate: 3.263E-05 | global batch size: 256 | lm loss: 2.226251E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.634 | TFLOPs: 29.99 | 7: iteration 95750/ 115203 | consumed samples: 24512000 | consumed tokens: 50200576000 | elapsed time per iteration (s): 0.43 | learning rate: 3.262E-05 | global batch size: 256 | lm loss: 2.199022E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.788 | TFLOPs: 30.95 | 7: iteration 95760/ 115203 | consumed samples: 24514560 | consumed tokens: 50205818880 | elapsed time per iteration (s): 0.42 | learning rate: 3.260E-05 | global batch size: 256 | lm loss: 2.201804E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.418 | TFLOPs: 31.71 | 7: iteration 95770/ 115203 | consumed samples: 24517120 | consumed tokens: 50211061760 | elapsed time per iteration (s): 0.43 | learning rate: 3.259E-05 | global batch size: 256 | lm loss: 2.219060E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.905 | TFLOPs: 31.06 | 7: iteration 95780/ 115203 | consumed samples: 24519680 | consumed tokens: 50216304640 | elapsed time per iteration (s): 0.43 | learning rate: 3.258E-05 | global batch size: 256 | lm loss: 2.221462E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.958 | TFLOPs: 31.16 | 7: iteration 95790/ 115203 | consumed samples: 24522240 | consumed tokens: 50221547520 | elapsed time per iteration (s): 0.44 | learning rate: 3.256E-05 | global batch size: 256 | lm loss: 2.237242E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.394 | TFLOPs: 30.45 | 7: iteration 95800/ 115203 | consumed samples: 24524800 | consumed tokens: 50226790400 | elapsed time per iteration (s): 0.45 | learning rate: 3.255E-05 | global batch size: 256 | lm loss: 2.235322E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.417 | TFLOPs: 29.61 | 7: iteration 95810/ 115203 | consumed samples: 24527360 | consumed tokens: 50232033280 | elapsed time per iteration (s): 0.44 | learning rate: 3.254E-05 | global batch size: 256 | lm loss: 2.230061E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.554 | TFLOPs: 30.41 | 7: iteration 95820/ 115203 | consumed samples: 24529920 | consumed tokens: 50237276160 | elapsed time per iteration (s): 0.43 | learning rate: 3.253E-05 | global batch size: 256 | lm loss: 2.240209E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.844 | TFLOPs: 31.37 | 7: iteration 95830/ 115203 | consumed samples: 24532480 | consumed tokens: 50242519040 | elapsed time per iteration (s): 0.43 | learning rate: 3.251E-05 | global batch size: 256 | lm loss: 2.251342E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.389 | TFLOPs: 31.29 | 7: iteration 95840/ 115203 | consumed samples: 24535040 | consumed tokens: 50247761920 | elapsed time per iteration (s): 0.43 | learning rate: 3.250E-05 | global batch size: 256 | lm loss: 2.224372E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.333 | TFLOPs: 31.18 | 7: iteration 95850/ 115203 | consumed samples: 24537600 | consumed tokens: 50253004800 | elapsed time per iteration (s): 0.43 | learning rate: 3.249E-05 | global batch size: 256 | lm loss: 2.235420E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.195 | TFLOPs: 31.44 | 7: iteration 95860/ 115203 | consumed samples: 24540160 | consumed tokens: 50258247680 | elapsed time per iteration (s): 0.43 | learning rate: 3.248E-05 | global batch size: 256 | lm loss: 2.241315E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.768 | TFLOPs: 31.36 | 7: iteration 95870/ 115203 | consumed samples: 24542720 | consumed tokens: 50263490560 | elapsed time per iteration (s): 0.44 | learning rate: 3.246E-05 | global batch size: 256 | lm loss: 2.259533E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.665 | TFLOPs: 30.52 | 7: iteration 95880/ 115203 | consumed samples: 24545280 | consumed tokens: 50268733440 | elapsed time per iteration (s): 0.42 | learning rate: 3.245E-05 | global batch size: 256 | lm loss: 2.203119E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.707 | TFLOPs: 31.73 | 7: iteration 95890/ 115203 | consumed samples: 24547840 | consumed tokens: 50273976320 | elapsed time per iteration (s): 0.42 | learning rate: 3.244E-05 | global batch size: 256 | lm loss: 2.236412E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.148 | TFLOPs: 31.80 | 7: iteration 95900/ 115203 | consumed samples: 24550400 | consumed tokens: 50279219200 | elapsed time per iteration (s): 0.43 | learning rate: 3.243E-05 | global batch size: 256 | lm loss: 2.234944E+00 | grad norm: 0.232 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.854 | TFLOPs: 31.11 | 7: iteration 95910/ 115203 | consumed samples: 24552960 | consumed tokens: 50284462080 | elapsed time per iteration (s): 0.42 | learning rate: 3.241E-05 | global batch size: 256 | lm loss: 2.230798E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.724 | TFLOPs: 31.83 | 7: iteration 95920/ 115203 | consumed samples: 24555520 | consumed tokens: 50289704960 | elapsed time per iteration (s): 0.43 | learning rate: 3.240E-05 | global batch size: 256 | lm loss: 2.207610E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.998 | TFLOPs: 31.48 | 7: iteration 95930/ 115203 | consumed samples: 24558080 | consumed tokens: 50294947840 | elapsed time per iteration (s): 0.42 | learning rate: 3.239E-05 | global batch size: 256 | lm loss: 2.218600E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.456 | TFLOPs: 31.66 | 7: iteration 95940/ 115203 | consumed samples: 24560640 | consumed tokens: 50300190720 | elapsed time per iteration (s): 0.44 | learning rate: 3.238E-05 | global batch size: 256 | lm loss: 2.201853E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.741 | TFLOPs: 30.42 | 7: iteration 95950/ 115203 | consumed samples: 24563200 | consumed tokens: 50305433600 | elapsed time per iteration (s): 0.43 | learning rate: 3.236E-05 | global batch size: 256 | lm loss: 2.217512E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.861 | TFLOPs: 31.47 | 7: iteration 95960/ 115203 | consumed samples: 24565760 | consumed tokens: 50310676480 | elapsed time per iteration (s): 0.43 | learning rate: 3.235E-05 | global batch size: 256 | lm loss: 2.208511E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.226 | TFLOPs: 31.34 | 7: iteration 95970/ 115203 | consumed samples: 24568320 | consumed tokens: 50315919360 | elapsed time per iteration (s): 0.42 | learning rate: 3.234E-05 | global batch size: 256 | lm loss: 2.189583E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.528 | TFLOPs: 31.61 | 7: iteration 95980/ 115203 | consumed samples: 24570880 | consumed tokens: 50321162240 | elapsed time per iteration (s): 0.42 | learning rate: 3.233E-05 | global batch size: 256 | lm loss: 2.207440E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.982 | TFLOPs: 31.69 | 7: iteration 95990/ 115203 | consumed samples: 24573440 | consumed tokens: 50326405120 | elapsed time per iteration (s): 0.43 | learning rate: 3.231E-05 | global batch size: 256 | lm loss: 2.249693E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.962 | TFLOPs: 31.06 | 0: [2022-11-29 00:31:46,269] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=0, lr=[3.230082550465275e-05, 3.230082550465275e-05, 3.230082550465275e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 96000/ 115203 | consumed samples: 24576000 | consumed tokens: 50331648000 | elapsed time per iteration (s): 0.43 | learning rate: 3.230E-05 | global batch size: 256 | lm loss: 2.254568E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.252 | TFLOPs: 31.55 | 0: steps: 96000 loss: 2.2389 iter time (s): 0.432 samples/sec: 593.206 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 96000 | lm loss value: 2.140287E+00 | lm loss PPL: 8.501879E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 96000 to checkpoints_221m 0: [2022-11-29 00:31:46,468] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step96000 is begin to save! 0: [2022-11-29 00:31:46,489] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:31:46,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:31:46,602] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:31:46,623] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:31:46,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:31:46,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:31:46,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:31:46,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:31:46,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:31:46,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:31:46,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:31:46,720] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:31:46,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:31:46,743] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:31:46,743] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:31:46,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:31:46,768] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:31:46,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:31:46,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:31:46,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:31:46,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:31:46,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:31:46,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:31:46,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:31:46,863] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:31:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:31:46,888] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:31:46,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:31:46,912] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:31:46,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:31:46,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:31:46,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:31:46,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:31:46,984] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:31:46,984] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:31:47,008] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:31:47,008] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:31:47,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:31:47,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:31:47,036] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:31:47,036] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step96000/mp_rank_00_model_states.pt 0: [2022-11-29 00:31:47,037] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:31:47,039] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:31:47,060] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:31:47,107] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,108] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,111] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,110] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,110] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,110] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,112] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,112] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,113] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,113] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,114] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,114] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:31:47,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 1: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,119] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,119] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 3: [2022-11-29 00:31:47,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:31:47,121] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:31:47,121] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 2: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,123] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,123] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,131] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,131] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,131] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 7: [2022-11-29 00:31:47,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:31:47,132] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:31:47,132] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,134] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,134] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,134] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:31:47,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,127] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 5: [2022-11-29 00:31:47,127] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,135] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,135] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,135] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,136] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,136] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,136] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:31:47,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 4: [2022-11-29 00:31:47,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:31:47,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:31:47,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: [2022-11-29 00:31:47,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:31:47,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,214] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,214] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:31:47,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,217] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step96000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 6: [2022-11-29 00:31:47,217] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step96000 is ready now! 0: successfully saved checkpoint at iteration 96000 to checkpoints_221m 7: time (ms) | save-checkpoint: 776.32 7: iteration 96010/ 115203 | consumed samples: 24578560 | consumed tokens: 50336890880 | elapsed time per iteration (s): 0.52 | learning rate: 3.229E-05 | global batch size: 256 | lm loss: 2.218381E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 490.345 | TFLOPs: 25.73 | 7: iteration 96020/ 115203 | consumed samples: 24581120 | consumed tokens: 50342133760 | elapsed time per iteration (s): 0.47 | learning rate: 3.228E-05 | global batch size: 256 | lm loss: 2.247283E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 541.999 | TFLOPs: 28.44 | 7: iteration 96030/ 115203 | consumed samples: 24583680 | consumed tokens: 50347376640 | elapsed time per iteration (s): 0.43 | learning rate: 3.226E-05 | global batch size: 256 | lm loss: 2.248930E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.397 | TFLOPs: 31.03 | 7: iteration 96040/ 115203 | consumed samples: 24586240 | consumed tokens: 50352619520 | elapsed time per iteration (s): 0.42 | learning rate: 3.225E-05 | global batch size: 256 | lm loss: 2.223028E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.324 | TFLOPs: 31.81 | 7: iteration 96050/ 115203 | consumed samples: 24588800 | consumed tokens: 50357862400 | elapsed time per iteration (s): 0.43 | learning rate: 3.224E-05 | global batch size: 256 | lm loss: 2.231358E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.797 | TFLOPs: 30.89 | 7: iteration 96060/ 115203 | consumed samples: 24591360 | consumed tokens: 50363105280 | elapsed time per iteration (s): 0.43 | learning rate: 3.223E-05 | global batch size: 256 | lm loss: 2.197212E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.749 | TFLOPs: 31.52 | 7: iteration 96070/ 115203 | consumed samples: 24593920 | consumed tokens: 50368348160 | elapsed time per iteration (s): 0.43 | learning rate: 3.221E-05 | global batch size: 256 | lm loss: 2.229311E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.875 | TFLOPs: 31.00 | 7: iteration 96080/ 115203 | consumed samples: 24596480 | consumed tokens: 50373591040 | elapsed time per iteration (s): 0.43 | learning rate: 3.220E-05 | global batch size: 256 | lm loss: 2.224820E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.918 | TFLOPs: 31.11 | 7: iteration 96090/ 115203 | consumed samples: 24599040 | consumed tokens: 50378833920 | elapsed time per iteration (s): 0.43 | learning rate: 3.219E-05 | global batch size: 256 | lm loss: 2.225625E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.608 | TFLOPs: 31.25 | 7: iteration 96100/ 115203 | consumed samples: 24601600 | consumed tokens: 50384076800 | elapsed time per iteration (s): 0.43 | learning rate: 3.218E-05 | global batch size: 256 | lm loss: 2.240854E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.271 | TFLOPs: 31.29 | 7: iteration 96110/ 115203 | consumed samples: 24604160 | consumed tokens: 50389319680 | elapsed time per iteration (s): 0.43 | learning rate: 3.216E-05 | global batch size: 256 | lm loss: 2.228354E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.237 | TFLOPs: 30.92 | 7: iteration 96120/ 115203 | consumed samples: 24606720 | consumed tokens: 50394562560 | elapsed time per iteration (s): 0.44 | learning rate: 3.215E-05 | global batch size: 256 | lm loss: 2.254210E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.998 | TFLOPs: 30.75 | 7: iteration 96130/ 115203 | consumed samples: 24609280 | consumed tokens: 50399805440 | elapsed time per iteration (s): 0.43 | learning rate: 3.214E-05 | global batch size: 256 | lm loss: 2.230101E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.968 | TFLOPs: 30.90 | 7: iteration 96140/ 115203 | consumed samples: 24611840 | consumed tokens: 50405048320 | elapsed time per iteration (s): 0.43 | learning rate: 3.213E-05 | global batch size: 256 | lm loss: 2.208706E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.384 | TFLOPs: 31.55 | 7: iteration 96150/ 115203 | consumed samples: 24614400 | consumed tokens: 50410291200 | elapsed time per iteration (s): 0.45 | learning rate: 3.211E-05 | global batch size: 256 | lm loss: 2.232839E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.720 | TFLOPs: 29.84 | 7: iteration 96160/ 115203 | consumed samples: 24616960 | consumed tokens: 50415534080 | elapsed time per iteration (s): 0.45 | learning rate: 3.210E-05 | global batch size: 256 | lm loss: 2.221355E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.517 | TFLOPs: 29.83 | 7: iteration 96170/ 115203 | consumed samples: 24619520 | consumed tokens: 50420776960 | elapsed time per iteration (s): 0.43 | learning rate: 3.209E-05 | global batch size: 256 | lm loss: 2.224154E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.775 | TFLOPs: 31.36 | 7: iteration 96180/ 115203 | consumed samples: 24622080 | consumed tokens: 50426019840 | elapsed time per iteration (s): 0.42 | learning rate: 3.208E-05 | global batch size: 256 | lm loss: 2.203088E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.282 | TFLOPs: 31.71 | 7: iteration 96190/ 115203 | consumed samples: 24624640 | consumed tokens: 50431262720 | elapsed time per iteration (s): 0.43 | learning rate: 3.206E-05 | global batch size: 256 | lm loss: 2.196016E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.293 | TFLOPs: 31.23 | 7: iteration 96200/ 115203 | consumed samples: 24627200 | consumed tokens: 50436505600 | elapsed time per iteration (s): 0.43 | learning rate: 3.205E-05 | global batch size: 256 | lm loss: 2.205986E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.914 | TFLOPs: 31.00 | 7: iteration 96210/ 115203 | consumed samples: 24629760 | consumed tokens: 50441748480 | elapsed time per iteration (s): 0.44 | learning rate: 3.204E-05 | global batch size: 256 | lm loss: 2.248982E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.140 | TFLOPs: 30.60 | 7: iteration 96220/ 115203 | consumed samples: 24632320 | consumed tokens: 50446991360 | elapsed time per iteration (s): 0.44 | learning rate: 3.203E-05 | global batch size: 256 | lm loss: 2.211356E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.057 | TFLOPs: 30.22 | 7: iteration 96230/ 115203 | consumed samples: 24634880 | consumed tokens: 50452234240 | elapsed time per iteration (s): 0.43 | learning rate: 3.201E-05 | global batch size: 256 | lm loss: 2.230947E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.649 | TFLOPs: 31.52 | 7: iteration 96240/ 115203 | consumed samples: 24637440 | consumed tokens: 50457477120 | elapsed time per iteration (s): 0.44 | learning rate: 3.200E-05 | global batch size: 256 | lm loss: 2.219179E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.279 | TFLOPs: 30.55 | 7: iteration 96250/ 115203 | consumed samples: 24640000 | consumed tokens: 50462720000 | elapsed time per iteration (s): 0.42 | learning rate: 3.199E-05 | global batch size: 256 | lm loss: 2.214977E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.214 | TFLOPs: 31.81 | 7: iteration 96260/ 115203 | consumed samples: 24642560 | consumed tokens: 50467962880 | elapsed time per iteration (s): 0.43 | learning rate: 3.198E-05 | global batch size: 256 | lm loss: 2.232438E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.552 | TFLOPs: 31.51 | 7: iteration 96270/ 115203 | consumed samples: 24645120 | consumed tokens: 50473205760 | elapsed time per iteration (s): 0.44 | learning rate: 3.197E-05 | global batch size: 256 | lm loss: 2.210927E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.809 | TFLOPs: 30.68 | 7: iteration 96280/ 115203 | consumed samples: 24647680 | consumed tokens: 50478448640 | elapsed time per iteration (s): 0.43 | learning rate: 3.195E-05 | global batch size: 256 | lm loss: 2.225467E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.162 | TFLOPs: 31.49 | 7: iteration 96290/ 115203 | consumed samples: 24650240 | consumed tokens: 50483691520 | elapsed time per iteration (s): 0.44 | learning rate: 3.194E-05 | global batch size: 256 | lm loss: 2.200746E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.432 | TFLOPs: 30.87 | 7: iteration 96300/ 115203 | consumed samples: 24652800 | consumed tokens: 50488934400 | elapsed time per iteration (s): 0.44 | learning rate: 3.193E-05 | global batch size: 256 | lm loss: 2.207676E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.789 | TFLOPs: 30.79 | 7: iteration 96310/ 115203 | consumed samples: 24655360 | consumed tokens: 50494177280 | elapsed time per iteration (s): 0.43 | learning rate: 3.192E-05 | global batch size: 256 | lm loss: 2.227986E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.840 | TFLOPs: 31.05 | 7: iteration 96320/ 115203 | consumed samples: 24657920 | consumed tokens: 50499420160 | elapsed time per iteration (s): 0.43 | learning rate: 3.190E-05 | global batch size: 256 | lm loss: 2.229626E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.487 | TFLOPs: 31.19 | 7: iteration 96330/ 115203 | consumed samples: 24660480 | consumed tokens: 50504663040 | elapsed time per iteration (s): 0.43 | learning rate: 3.189E-05 | global batch size: 256 | lm loss: 2.246446E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.655 | TFLOPs: 31.46 | 7: iteration 96340/ 115203 | consumed samples: 24663040 | consumed tokens: 50509905920 | elapsed time per iteration (s): 0.44 | learning rate: 3.188E-05 | global batch size: 256 | lm loss: 2.219609E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.510 | TFLOPs: 30.83 | 7: iteration 96350/ 115203 | consumed samples: 24665600 | consumed tokens: 50515148800 | elapsed time per iteration (s): 0.44 | learning rate: 3.187E-05 | global batch size: 256 | lm loss: 2.213319E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.238 | TFLOPs: 30.39 | 7: iteration 96360/ 115203 | consumed samples: 24668160 | consumed tokens: 50520391680 | elapsed time per iteration (s): 0.43 | learning rate: 3.185E-05 | global batch size: 256 | lm loss: 2.228176E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.987 | TFLOPs: 30.96 | 7: iteration 96370/ 115203 | consumed samples: 24670720 | consumed tokens: 50525634560 | elapsed time per iteration (s): 0.43 | learning rate: 3.184E-05 | global batch size: 256 | lm loss: 2.246512E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.576 | TFLOPs: 31.25 | 7: iteration 96380/ 115203 | consumed samples: 24673280 | consumed tokens: 50530877440 | elapsed time per iteration (s): 0.43 | learning rate: 3.183E-05 | global batch size: 256 | lm loss: 2.200981E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.728 | TFLOPs: 31.20 | 7: iteration 96390/ 115203 | consumed samples: 24675840 | consumed tokens: 50536120320 | elapsed time per iteration (s): 0.44 | learning rate: 3.182E-05 | global batch size: 256 | lm loss: 2.236891E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.160 | TFLOPs: 30.28 | 7: iteration 96400/ 115203 | consumed samples: 24678400 | consumed tokens: 50541363200 | elapsed time per iteration (s): 0.43 | learning rate: 3.181E-05 | global batch size: 256 | lm loss: 2.227795E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.937 | TFLOPs: 31.16 | 7: iteration 96410/ 115203 | consumed samples: 24680960 | consumed tokens: 50546606080 | elapsed time per iteration (s): 0.44 | learning rate: 3.179E-05 | global batch size: 256 | lm loss: 2.228475E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.769 | TFLOPs: 30.79 | 7: iteration 96420/ 115203 | consumed samples: 24683520 | consumed tokens: 50551848960 | elapsed time per iteration (s): 0.45 | learning rate: 3.178E-05 | global batch size: 256 | lm loss: 2.205219E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.174 | TFLOPs: 29.81 | 7: iteration 96430/ 115203 | consumed samples: 24686080 | consumed tokens: 50557091840 | elapsed time per iteration (s): 0.43 | learning rate: 3.177E-05 | global batch size: 256 | lm loss: 2.254445E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.532 | TFLOPs: 31.46 | 7: iteration 96440/ 115203 | consumed samples: 24688640 | consumed tokens: 50562334720 | elapsed time per iteration (s): 0.43 | learning rate: 3.176E-05 | global batch size: 256 | lm loss: 2.205751E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.188 | TFLOPs: 31.60 | 7: iteration 96450/ 115203 | consumed samples: 24691200 | consumed tokens: 50567577600 | elapsed time per iteration (s): 0.43 | learning rate: 3.174E-05 | global batch size: 256 | lm loss: 2.238210E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.951 | TFLOPs: 31.01 | 7: iteration 96460/ 115203 | consumed samples: 24693760 | consumed tokens: 50572820480 | elapsed time per iteration (s): 0.44 | learning rate: 3.173E-05 | global batch size: 256 | lm loss: 2.230309E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.933 | TFLOPs: 30.64 | 7: iteration 96470/ 115203 | consumed samples: 24696320 | consumed tokens: 50578063360 | elapsed time per iteration (s): 0.43 | learning rate: 3.172E-05 | global batch size: 256 | lm loss: 2.213170E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.605 | TFLOPs: 31.15 | 7: iteration 96480/ 115203 | consumed samples: 24698880 | consumed tokens: 50583306240 | elapsed time per iteration (s): 0.45 | learning rate: 3.171E-05 | global batch size: 256 | lm loss: 2.219769E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.399 | TFLOPs: 29.61 | 7: iteration 96490/ 115203 | consumed samples: 24701440 | consumed tokens: 50588549120 | elapsed time per iteration (s): 0.43 | learning rate: 3.169E-05 | global batch size: 256 | lm loss: 2.218474E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.991 | TFLOPs: 30.90 | 7: iteration 96500/ 115203 | consumed samples: 24704000 | consumed tokens: 50593792000 | elapsed time per iteration (s): 0.43 | learning rate: 3.168E-05 | global batch size: 256 | lm loss: 2.262877E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.245 | TFLOPs: 31.18 | 7: iteration 96510/ 115203 | consumed samples: 24706560 | consumed tokens: 50599034880 | elapsed time per iteration (s): 0.44 | learning rate: 3.167E-05 | global batch size: 256 | lm loss: 2.242162E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.330 | TFLOPs: 30.50 | 7: iteration 96520/ 115203 | consumed samples: 24709120 | consumed tokens: 50604277760 | elapsed time per iteration (s): 0.42 | learning rate: 3.166E-05 | global batch size: 256 | lm loss: 2.232122E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.062 | TFLOPs: 31.69 | 7: iteration 96530/ 115203 | consumed samples: 24711680 | consumed tokens: 50609520640 | elapsed time per iteration (s): 0.43 | learning rate: 3.165E-05 | global batch size: 256 | lm loss: 2.230037E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.325 | TFLOPs: 31.18 | 7: iteration 96540/ 115203 | consumed samples: 24714240 | consumed tokens: 50614763520 | elapsed time per iteration (s): 0.44 | learning rate: 3.163E-05 | global batch size: 256 | lm loss: 2.233812E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.940 | TFLOPs: 30.74 | 7: iteration 96550/ 115203 | consumed samples: 24716800 | consumed tokens: 50620006400 | elapsed time per iteration (s): 0.44 | learning rate: 3.162E-05 | global batch size: 256 | lm loss: 2.229019E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.385 | TFLOPs: 30.56 | 7: iteration 96560/ 115203 | consumed samples: 24719360 | consumed tokens: 50625249280 | elapsed time per iteration (s): 0.44 | learning rate: 3.161E-05 | global batch size: 256 | lm loss: 2.240331E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.147 | TFLOPs: 30.60 | 7: iteration 96570/ 115203 | consumed samples: 24721920 | consumed tokens: 50630492160 | elapsed time per iteration (s): 0.43 | learning rate: 3.160E-05 | global batch size: 256 | lm loss: 2.221890E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.918 | TFLOPs: 31.11 | 7: iteration 96580/ 115203 | consumed samples: 24724480 | consumed tokens: 50635735040 | elapsed time per iteration (s): 0.44 | learning rate: 3.159E-05 | global batch size: 256 | lm loss: 2.210022E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.878 | TFLOPs: 30.43 | 7: iteration 96590/ 115203 | consumed samples: 24727040 | consumed tokens: 50640977920 | elapsed time per iteration (s): 0.44 | learning rate: 3.157E-05 | global batch size: 256 | lm loss: 2.196520E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.792 | TFLOPs: 30.74 | 7: iteration 96600/ 115203 | consumed samples: 24729600 | consumed tokens: 50646220800 | elapsed time per iteration (s): 0.43 | learning rate: 3.156E-05 | global batch size: 256 | lm loss: 2.231188E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.860 | TFLOPs: 30.90 | 7: iteration 96610/ 115203 | consumed samples: 24732160 | consumed tokens: 50651463680 | elapsed time per iteration (s): 0.43 | learning rate: 3.155E-05 | global batch size: 256 | lm loss: 2.260472E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.033 | TFLOPs: 31.06 | 7: iteration 96620/ 115203 | consumed samples: 24734720 | consumed tokens: 50656706560 | elapsed time per iteration (s): 0.43 | learning rate: 3.154E-05 | global batch size: 256 | lm loss: 2.243319E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.960 | TFLOPs: 31.32 | 7: iteration 96630/ 115203 | consumed samples: 24737280 | consumed tokens: 50661949440 | elapsed time per iteration (s): 0.44 | learning rate: 3.152E-05 | global batch size: 256 | lm loss: 2.213819E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.209 | TFLOPs: 30.81 | 7: iteration 96640/ 115203 | consumed samples: 24739840 | consumed tokens: 50667192320 | elapsed time per iteration (s): 0.43 | learning rate: 3.151E-05 | global batch size: 256 | lm loss: 2.216989E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.783 | TFLOPs: 31.05 | 7: iteration 96650/ 115203 | consumed samples: 24742400 | consumed tokens: 50672435200 | elapsed time per iteration (s): 0.44 | learning rate: 3.150E-05 | global batch size: 256 | lm loss: 2.230582E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.060 | TFLOPs: 30.43 | 7: iteration 96660/ 115203 | consumed samples: 24744960 | consumed tokens: 50677678080 | elapsed time per iteration (s): 0.43 | learning rate: 3.149E-05 | global batch size: 256 | lm loss: 2.251409E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.738 | TFLOPs: 30.89 | 7: iteration 96670/ 115203 | consumed samples: 24747520 | consumed tokens: 50682920960 | elapsed time per iteration (s): 0.43 | learning rate: 3.148E-05 | global batch size: 256 | lm loss: 2.210780E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.120 | TFLOPs: 30.96 | 7: iteration 96680/ 115203 | consumed samples: 24750080 | consumed tokens: 50688163840 | elapsed time per iteration (s): 0.44 | learning rate: 3.146E-05 | global batch size: 256 | lm loss: 2.239886E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.548 | TFLOPs: 30.72 | 7: iteration 96690/ 115203 | consumed samples: 24752640 | consumed tokens: 50693406720 | elapsed time per iteration (s): 0.44 | learning rate: 3.145E-05 | global batch size: 256 | lm loss: 2.219567E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.008 | TFLOPs: 30.22 | 7: iteration 96700/ 115203 | consumed samples: 24755200 | consumed tokens: 50698649600 | elapsed time per iteration (s): 0.43 | learning rate: 3.144E-05 | global batch size: 256 | lm loss: 2.234111E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.442 | TFLOPs: 31.24 | 7: iteration 96710/ 115203 | consumed samples: 24757760 | consumed tokens: 50703892480 | elapsed time per iteration (s): 0.46 | learning rate: 3.143E-05 | global batch size: 256 | lm loss: 2.237984E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.855 | TFLOPs: 29.11 | 7: iteration 96720/ 115203 | consumed samples: 24760320 | consumed tokens: 50709135360 | elapsed time per iteration (s): 0.43 | learning rate: 3.142E-05 | global batch size: 256 | lm loss: 2.219070E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.104 | TFLOPs: 31.59 | 7: iteration 96730/ 115203 | consumed samples: 24762880 | consumed tokens: 50714378240 | elapsed time per iteration (s): 0.43 | learning rate: 3.140E-05 | global batch size: 256 | lm loss: 2.223312E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.271 | TFLOPs: 31.18 | 7: iteration 96740/ 115203 | consumed samples: 24765440 | consumed tokens: 50719621120 | elapsed time per iteration (s): 0.43 | learning rate: 3.139E-05 | global batch size: 256 | lm loss: 2.195983E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.725 | TFLOPs: 31.05 | 7: iteration 96750/ 115203 | consumed samples: 24768000 | consumed tokens: 50724864000 | elapsed time per iteration (s): 0.43 | learning rate: 3.138E-05 | global batch size: 256 | lm loss: 2.235909E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.582 | TFLOPs: 30.99 | 7: iteration 96760/ 115203 | consumed samples: 24770560 | consumed tokens: 50730106880 | elapsed time per iteration (s): 0.43 | learning rate: 3.137E-05 | global batch size: 256 | lm loss: 2.167693E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.569 | TFLOPs: 31.51 | 7: iteration 96770/ 115203 | consumed samples: 24773120 | consumed tokens: 50735349760 | elapsed time per iteration (s): 0.42 | learning rate: 3.136E-05 | global batch size: 256 | lm loss: 2.239643E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.534 | TFLOPs: 31.98 | 7: iteration 96780/ 115203 | consumed samples: 24775680 | consumed tokens: 50740592640 | elapsed time per iteration (s): 0.43 | learning rate: 3.134E-05 | global batch size: 256 | lm loss: 2.213217E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.332 | TFLOPs: 31.13 | 7: iteration 96790/ 115203 | consumed samples: 24778240 | consumed tokens: 50745835520 | elapsed time per iteration (s): 0.44 | learning rate: 3.133E-05 | global batch size: 256 | lm loss: 2.230999E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.260 | TFLOPs: 30.81 | 7: iteration 96800/ 115203 | consumed samples: 24780800 | consumed tokens: 50751078400 | elapsed time per iteration (s): 0.43 | learning rate: 3.132E-05 | global batch size: 256 | lm loss: 2.208123E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.922 | TFLOPs: 30.95 | 7: iteration 96810/ 115203 | consumed samples: 24783360 | consumed tokens: 50756321280 | elapsed time per iteration (s): 0.43 | learning rate: 3.131E-05 | global batch size: 256 | lm loss: 2.212429E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.473 | TFLOPs: 30.98 | 7: iteration 96820/ 115203 | consumed samples: 24785920 | consumed tokens: 50761564160 | elapsed time per iteration (s): 0.42 | learning rate: 3.129E-05 | global batch size: 256 | lm loss: 2.214850E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.061 | TFLOPs: 31.85 | 7: iteration 96830/ 115203 | consumed samples: 24788480 | consumed tokens: 50766807040 | elapsed time per iteration (s): 0.51 | learning rate: 3.128E-05 | global batch size: 256 | lm loss: 2.226728E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 503.162 | TFLOPs: 26.40 | 7: iteration 96840/ 115203 | consumed samples: 24791040 | consumed tokens: 50772049920 | elapsed time per iteration (s): 0.43 | learning rate: 3.127E-05 | global batch size: 256 | lm loss: 2.233965E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.513 | TFLOPs: 31.56 | 7: iteration 96850/ 115203 | consumed samples: 24793600 | consumed tokens: 50777292800 | elapsed time per iteration (s): 0.43 | learning rate: 3.126E-05 | global batch size: 256 | lm loss: 2.250793E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.304 | TFLOPs: 31.02 | 7: iteration 96860/ 115203 | consumed samples: 24796160 | consumed tokens: 50782535680 | elapsed time per iteration (s): 0.43 | learning rate: 3.125E-05 | global batch size: 256 | lm loss: 2.216080E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.883 | TFLOPs: 31.06 | 7: iteration 96870/ 115203 | consumed samples: 24798720 | consumed tokens: 50787778560 | elapsed time per iteration (s): 0.43 | learning rate: 3.123E-05 | global batch size: 256 | lm loss: 2.212596E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.709 | TFLOPs: 31.20 | 7: iteration 96880/ 115203 | consumed samples: 24801280 | consumed tokens: 50793021440 | elapsed time per iteration (s): 0.43 | learning rate: 3.122E-05 | global batch size: 256 | lm loss: 2.208502E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.426 | TFLOPs: 31.24 | 7: iteration 96890/ 115203 | consumed samples: 24803840 | consumed tokens: 50798264320 | elapsed time per iteration (s): 0.43 | learning rate: 3.121E-05 | global batch size: 256 | lm loss: 2.214948E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.173 | TFLOPs: 31.44 | 7: iteration 96900/ 115203 | consumed samples: 24806400 | consumed tokens: 50803507200 | elapsed time per iteration (s): 0.44 | learning rate: 3.120E-05 | global batch size: 256 | lm loss: 2.205263E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.094 | TFLOPs: 30.86 | 7: iteration 96910/ 115203 | consumed samples: 24808960 | consumed tokens: 50808750080 | elapsed time per iteration (s): 0.43 | learning rate: 3.119E-05 | global batch size: 256 | lm loss: 2.249553E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.542 | TFLOPs: 31.30 | 7: iteration 96920/ 115203 | consumed samples: 24811520 | consumed tokens: 50813992960 | elapsed time per iteration (s): 0.43 | learning rate: 3.117E-05 | global batch size: 256 | lm loss: 2.215313E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.438 | TFLOPs: 31.24 | 7: iteration 96930/ 115203 | consumed samples: 24814080 | consumed tokens: 50819235840 | elapsed time per iteration (s): 0.45 | learning rate: 3.116E-05 | global batch size: 256 | lm loss: 2.230536E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.194 | TFLOPs: 29.71 | 7: iteration 96940/ 115203 | consumed samples: 24816640 | consumed tokens: 50824478720 | elapsed time per iteration (s): 0.43 | learning rate: 3.115E-05 | global batch size: 256 | lm loss: 2.208679E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.643 | TFLOPs: 31.41 | 7: iteration 96950/ 115203 | consumed samples: 24819200 | consumed tokens: 50829721600 | elapsed time per iteration (s): 0.44 | learning rate: 3.114E-05 | global batch size: 256 | lm loss: 2.210591E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.456 | TFLOPs: 30.82 | 7: iteration 96960/ 115203 | consumed samples: 24821760 | consumed tokens: 50834964480 | elapsed time per iteration (s): 0.43 | learning rate: 3.113E-05 | global batch size: 256 | lm loss: 2.224718E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.474 | TFLOPs: 30.93 | 7: iteration 96970/ 115203 | consumed samples: 24824320 | consumed tokens: 50840207360 | elapsed time per iteration (s): 0.43 | learning rate: 3.112E-05 | global batch size: 256 | lm loss: 2.232762E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.411 | TFLOPs: 30.98 | 7: iteration 96980/ 115203 | consumed samples: 24826880 | consumed tokens: 50845450240 | elapsed time per iteration (s): 0.44 | learning rate: 3.110E-05 | global batch size: 256 | lm loss: 2.242699E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.430 | TFLOPs: 30.56 | 7: iteration 96990/ 115203 | consumed samples: 24829440 | consumed tokens: 50850693120 | elapsed time per iteration (s): 0.44 | learning rate: 3.109E-05 | global batch size: 256 | lm loss: 2.216284E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.294 | TFLOPs: 30.66 | 7: iteration 97000/ 115203 | consumed samples: 24832000 | consumed tokens: 50855936000 | elapsed time per iteration (s): 0.43 | learning rate: 3.108E-05 | global batch size: 256 | lm loss: 2.253568E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.298 | TFLOPs: 31.29 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 97000 | lm loss value: 2.179934E+00 | lm loss PPL: 8.845723E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 97000 to checkpoints_221m 0: [2022-11-29 00:39:02,224] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step97000 is begin to save! 0: [2022-11-29 00:39:02,230] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:39:02,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:39:02,347] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:39:02,370] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:39:02,370] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:39:02,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:39:02,394] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:39:02,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:39:02,420] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:39:02,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:39:02,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:39:02,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:39:02,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:39:02,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:39:02,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:39:02,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:39:02,518] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:39:02,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:39:02,540] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:39:02,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:39:02,564] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:39:02,588] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:39:02,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:39:02,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:39:02,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:39:02,637] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:39:02,637] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:39:02,661] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:39:02,661] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:39:02,685] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:39:02,685] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:39:02,708] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:39:02,708] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:39:02,733] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:39:02,733] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:39:02,757] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:39:02,757] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:39:02,781] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:39:02,782] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:39:02,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:39:02,787] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step97000/mp_rank_00_model_states.pt 0: [2022-11-29 00:39:02,787] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:39:02,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:39:02,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,868] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,868] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,872] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,872] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,872] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,873] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,873] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,873] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,878] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,878] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:39:02,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 7: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:39:02,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:39:02,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,865] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,865] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,860] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,865] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,860] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,861] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,861] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,864] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,864] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,864] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,867] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,867] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,871] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2022-11-29 00:39:02,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,867] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,871] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:39:02,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:39:02,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-29 00:39:02,877] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:39:02,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 3: [2022-11-29 00:39:02,877] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 5: [2022-11-29 00:39:02,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:39:02,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:39:02,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,906] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,906] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 6: [2022-11-29 00:39:02,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:39:02,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:39:02,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,909] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:39:02,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 4: [2022-11-29 00:39:02,910] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:39:02,910] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: [2022-11-29 00:39:02,935] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step97000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:39:02,935] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step97000 is ready now! 0: successfully saved checkpoint at iteration 97000 to checkpoints_221m 7: time (ms) | save-checkpoint: 730.13 7: iteration 97010/ 115203 | consumed samples: 24834560 | consumed tokens: 50861178880 | elapsed time per iteration (s): 0.51 | learning rate: 3.107E-05 | global batch size: 256 | lm loss: 2.248880E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 498.929 | TFLOPs: 26.18 | 7: iteration 97020/ 115203 | consumed samples: 24837120 | consumed tokens: 50866421760 | elapsed time per iteration (s): 0.65 | learning rate: 3.106E-05 | global batch size: 256 | lm loss: 2.209886E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 393.139 | TFLOPs: 20.63 | 7: iteration 97030/ 115203 | consumed samples: 24839680 | consumed tokens: 50871664640 | elapsed time per iteration (s): 0.43 | learning rate: 3.104E-05 | global batch size: 256 | lm loss: 2.213331E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.388 | TFLOPs: 31.55 | 7: iteration 97040/ 115203 | consumed samples: 24842240 | consumed tokens: 50876907520 | elapsed time per iteration (s): 0.43 | learning rate: 3.103E-05 | global batch size: 256 | lm loss: 2.206372E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.196 | TFLOPs: 31.28 | 7: iteration 97050/ 115203 | consumed samples: 24844800 | consumed tokens: 50882150400 | elapsed time per iteration (s): 0.44 | learning rate: 3.102E-05 | global batch size: 256 | lm loss: 2.215923E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.608 | TFLOPs: 30.78 | 7: iteration 97060/ 115203 | consumed samples: 24847360 | consumed tokens: 50887393280 | elapsed time per iteration (s): 0.43 | learning rate: 3.101E-05 | global batch size: 256 | lm loss: 2.217892E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.818 | TFLOPs: 31.21 | 7: iteration 97070/ 115203 | consumed samples: 24849920 | consumed tokens: 50892636160 | elapsed time per iteration (s): 0.42 | learning rate: 3.100E-05 | global batch size: 256 | lm loss: 2.238199E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.589 | TFLOPs: 31.88 | 7: iteration 97080/ 115203 | consumed samples: 24852480 | consumed tokens: 50897879040 | elapsed time per iteration (s): 0.43 | learning rate: 3.098E-05 | global batch size: 256 | lm loss: 2.226614E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | 7: iteration 97090/ 115203 | consumed samples: 24855040 | consumed tokens: 50903121920 | elapsed time per iteration (s): 0.43 | learning rate: 3.097E-05 | global batch size: 256 | lm loss: 2.252418E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.111 | TFLOPs: 31.33 | 7: iteration 97100/ 115203 | consumed samples: 24857600 | consumed tokens: 50908364800 | elapsed time per iteration (s): 0.42 | learning rate: 3.096E-05 | global batch size: 256 | lm loss: 2.210186E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.117 | TFLOPs: 32.01 | 7: iteration 97110/ 115203 | consumed samples: 24860160 | consumed tokens: 50913607680 | elapsed time per iteration (s): 0.42 | learning rate: 3.095E-05 | global batch size: 256 | lm loss: 2.218948E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.489 | TFLOPs: 31.61 | 7: iteration 97120/ 115203 | consumed samples: 24862720 | consumed tokens: 50918850560 | elapsed time per iteration (s): 0.43 | learning rate: 3.094E-05 | global batch size: 256 | lm loss: 2.227813E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.878 | TFLOPs: 31.37 | 7: iteration 97130/ 115203 | consumed samples: 24865280 | consumed tokens: 50924093440 | elapsed time per iteration (s): 0.43 | learning rate: 3.092E-05 | global batch size: 256 | lm loss: 2.246445E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.447 | TFLOPs: 31.40 | 7: iteration 97140/ 115203 | consumed samples: 24867840 | consumed tokens: 50929336320 | elapsed time per iteration (s): 0.43 | learning rate: 3.091E-05 | global batch size: 256 | lm loss: 2.227926E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.121 | TFLOPs: 31.02 | 7: iteration 97150/ 115203 | consumed samples: 24870400 | consumed tokens: 50934579200 | elapsed time per iteration (s): 0.42 | learning rate: 3.090E-05 | global batch size: 256 | lm loss: 2.256996E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.020 | TFLOPs: 31.69 | 7: iteration 97160/ 115203 | consumed samples: 24872960 | consumed tokens: 50939822080 | elapsed time per iteration (s): 0.43 | learning rate: 3.089E-05 | global batch size: 256 | lm loss: 2.257859E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.357 | TFLOPs: 30.92 | 7: iteration 97170/ 115203 | consumed samples: 24875520 | consumed tokens: 50945064960 | elapsed time per iteration (s): 0.43 | learning rate: 3.088E-05 | global batch size: 256 | lm loss: 2.248659E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.022 | TFLOPs: 31.11 | 7: iteration 97180/ 115203 | consumed samples: 24878080 | consumed tokens: 50950307840 | elapsed time per iteration (s): 0.44 | learning rate: 3.087E-05 | global batch size: 256 | lm loss: 2.259271E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.378 | TFLOPs: 30.71 | 7: iteration 97190/ 115203 | consumed samples: 24880640 | consumed tokens: 50955550720 | elapsed time per iteration (s): 0.44 | learning rate: 3.085E-05 | global batch size: 256 | lm loss: 2.199950E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.367 | TFLOPs: 30.71 | 7: iteration 97200/ 115203 | consumed samples: 24883200 | consumed tokens: 50960793600 | elapsed time per iteration (s): 0.43 | learning rate: 3.084E-05 | global batch size: 256 | lm loss: 2.217596E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.181 | TFLOPs: 31.07 | 7: iteration 97210/ 115203 | consumed samples: 24885760 | consumed tokens: 50966036480 | elapsed time per iteration (s): 0.42 | learning rate: 3.083E-05 | global batch size: 256 | lm loss: 2.231941E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.519 | TFLOPs: 31.67 | 7: iteration 97220/ 115203 | consumed samples: 24888320 | consumed tokens: 50971279360 | elapsed time per iteration (s): 0.43 | learning rate: 3.082E-05 | global batch size: 256 | lm loss: 2.177716E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.224 | TFLOPs: 31.02 | 7: iteration 97230/ 115203 | consumed samples: 24890880 | consumed tokens: 50976522240 | elapsed time per iteration (s): 0.43 | learning rate: 3.081E-05 | global batch size: 256 | lm loss: 2.207257E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.554 | TFLOPs: 31.51 | 7: iteration 97240/ 115203 | consumed samples: 24893440 | consumed tokens: 50981765120 | elapsed time per iteration (s): 0.43 | learning rate: 3.080E-05 | global batch size: 256 | lm loss: 2.210675E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.161 | TFLOPs: 31.28 | 7: iteration 97250/ 115203 | consumed samples: 24896000 | consumed tokens: 50987008000 | elapsed time per iteration (s): 0.42 | learning rate: 3.078E-05 | global batch size: 256 | lm loss: 2.235522E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.176 | TFLOPs: 31.70 | 7: iteration 97260/ 115203 | consumed samples: 24898560 | consumed tokens: 50992250880 | elapsed time per iteration (s): 0.43 | learning rate: 3.077E-05 | global batch size: 256 | lm loss: 2.196952E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.464 | TFLOPs: 31.30 | 7: iteration 97270/ 115203 | consumed samples: 24901120 | consumed tokens: 50997493760 | elapsed time per iteration (s): 0.43 | learning rate: 3.076E-05 | global batch size: 256 | lm loss: 2.231211E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.454 | TFLOPs: 31.19 | 7: iteration 97280/ 115203 | consumed samples: 24903680 | consumed tokens: 51002736640 | elapsed time per iteration (s): 0.43 | learning rate: 3.075E-05 | global batch size: 256 | lm loss: 2.226014E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.866 | TFLOPs: 31.58 | 7: iteration 97290/ 115203 | consumed samples: 24906240 | consumed tokens: 51007979520 | elapsed time per iteration (s): 0.42 | learning rate: 3.074E-05 | global batch size: 256 | lm loss: 2.208308E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.233 | TFLOPs: 31.76 | 7: iteration 97300/ 115203 | consumed samples: 24908800 | consumed tokens: 51013222400 | elapsed time per iteration (s): 0.43 | learning rate: 3.072E-05 | global batch size: 256 | lm loss: 2.208672E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.890 | TFLOPs: 30.90 | 7: iteration 97310/ 115203 | consumed samples: 24911360 | consumed tokens: 51018465280 | elapsed time per iteration (s): 0.43 | learning rate: 3.071E-05 | global batch size: 256 | lm loss: 2.237531E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.591 | TFLOPs: 31.09 | 7: iteration 97320/ 115203 | consumed samples: 24913920 | consumed tokens: 51023708160 | elapsed time per iteration (s): 0.43 | learning rate: 3.070E-05 | global batch size: 256 | lm loss: 2.213967E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.811 | TFLOPs: 31.05 | 7: iteration 97330/ 115203 | consumed samples: 24916480 | consumed tokens: 51028951040 | elapsed time per iteration (s): 0.43 | learning rate: 3.069E-05 | global batch size: 256 | lm loss: 2.217129E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.505 | TFLOPs: 31.19 | 7: iteration 97340/ 115203 | consumed samples: 24919040 | consumed tokens: 51034193920 | elapsed time per iteration (s): 0.42 | learning rate: 3.068E-05 | global batch size: 256 | lm loss: 2.225170E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.929 | TFLOPs: 31.84 | 7: iteration 97350/ 115203 | consumed samples: 24921600 | consumed tokens: 51039436800 | elapsed time per iteration (s): 0.42 | learning rate: 3.067E-05 | global batch size: 256 | lm loss: 2.231340E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.124 | TFLOPs: 31.70 | 7: iteration 97360/ 115203 | consumed samples: 24924160 | consumed tokens: 51044679680 | elapsed time per iteration (s): 0.43 | learning rate: 3.065E-05 | global batch size: 256 | lm loss: 2.222768E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.555 | TFLOPs: 31.20 | 7: iteration 97370/ 115203 | consumed samples: 24926720 | consumed tokens: 51049922560 | elapsed time per iteration (s): 0.43 | learning rate: 3.064E-05 | global batch size: 256 | lm loss: 2.211890E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.881 | TFLOPs: 31.00 | 7: iteration 97380/ 115203 | consumed samples: 24929280 | consumed tokens: 51055165440 | elapsed time per iteration (s): 0.43 | learning rate: 3.063E-05 | global batch size: 256 | lm loss: 2.220625E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.632 | TFLOPs: 31.15 | 7: iteration 97390/ 115203 | consumed samples: 24931840 | consumed tokens: 51060408320 | elapsed time per iteration (s): 0.42 | learning rate: 3.062E-05 | global batch size: 256 | lm loss: 2.239996E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.663 | TFLOPs: 31.73 | 7: iteration 97400/ 115203 | consumed samples: 24934400 | consumed tokens: 51065651200 | elapsed time per iteration (s): 0.42 | learning rate: 3.061E-05 | global batch size: 256 | lm loss: 2.259947E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.846 | TFLOPs: 31.74 | 7: iteration 97410/ 115203 | consumed samples: 24936960 | consumed tokens: 51070894080 | elapsed time per iteration (s): 0.43 | learning rate: 3.060E-05 | global batch size: 256 | lm loss: 2.244336E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.783 | TFLOPs: 31.15 | 7: iteration 97420/ 115203 | consumed samples: 24939520 | consumed tokens: 51076136960 | elapsed time per iteration (s): 0.42 | learning rate: 3.058E-05 | global batch size: 256 | lm loss: 2.216770E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.306 | TFLOPs: 31.86 | 7: iteration 97430/ 115203 | consumed samples: 24942080 | consumed tokens: 51081379840 | elapsed time per iteration (s): 0.43 | learning rate: 3.057E-05 | global batch size: 256 | lm loss: 2.208154E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.974 | TFLOPs: 31.48 | 7: iteration 97440/ 115203 | consumed samples: 24944640 | consumed tokens: 51086622720 | elapsed time per iteration (s): 0.42 | learning rate: 3.056E-05 | global batch size: 256 | lm loss: 2.219620E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.378 | TFLOPs: 31.66 | 7: iteration 97450/ 115203 | consumed samples: 24947200 | consumed tokens: 51091865600 | elapsed time per iteration (s): 0.43 | learning rate: 3.055E-05 | global batch size: 256 | lm loss: 2.224270E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.303 | TFLOPs: 31.39 | 7: iteration 97460/ 115203 | consumed samples: 24949760 | consumed tokens: 51097108480 | elapsed time per iteration (s): 0.43 | learning rate: 3.054E-05 | global batch size: 256 | lm loss: 2.228974E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.128 | TFLOPs: 31.54 | 7: iteration 97470/ 115203 | consumed samples: 24952320 | consumed tokens: 51102351360 | elapsed time per iteration (s): 0.43 | learning rate: 3.053E-05 | global batch size: 256 | lm loss: 2.229227E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.529 | TFLOPs: 31.40 | 7: iteration 97480/ 115203 | consumed samples: 24954880 | consumed tokens: 51107594240 | elapsed time per iteration (s): 0.42 | learning rate: 3.051E-05 | global batch size: 256 | lm loss: 2.191891E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.779 | TFLOPs: 31.68 | 7: iteration 97490/ 115203 | consumed samples: 24957440 | consumed tokens: 51112837120 | elapsed time per iteration (s): 0.42 | learning rate: 3.050E-05 | global batch size: 256 | lm loss: 2.216183E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.144 | TFLOPs: 31.70 | 7: iteration 97500/ 115203 | consumed samples: 24960000 | consumed tokens: 51118080000 | elapsed time per iteration (s): 0.44 | learning rate: 3.049E-05 | global batch size: 256 | lm loss: 2.208987E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.831 | TFLOPs: 30.84 | 7: iteration 97510/ 115203 | consumed samples: 24962560 | consumed tokens: 51123322880 | elapsed time per iteration (s): 0.43 | learning rate: 3.048E-05 | global batch size: 256 | lm loss: 2.203854E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.428 | TFLOPs: 30.93 | 7: iteration 97520/ 115203 | consumed samples: 24965120 | consumed tokens: 51128565760 | elapsed time per iteration (s): 0.42 | learning rate: 3.047E-05 | global batch size: 256 | lm loss: 2.230819E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.689 | TFLOPs: 31.67 | 7: iteration 97530/ 115203 | consumed samples: 24967680 | consumed tokens: 51133808640 | elapsed time per iteration (s): 0.42 | learning rate: 3.046E-05 | global batch size: 256 | lm loss: 2.234894E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.811 | TFLOPs: 31.79 | 7: iteration 97540/ 115203 | consumed samples: 24970240 | consumed tokens: 51139051520 | elapsed time per iteration (s): 0.43 | learning rate: 3.044E-05 | global batch size: 256 | lm loss: 2.243226E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.760 | TFLOPs: 31.52 | 7: iteration 97550/ 115203 | consumed samples: 24972800 | consumed tokens: 51144294400 | elapsed time per iteration (s): 0.43 | learning rate: 3.043E-05 | global batch size: 256 | lm loss: 2.225986E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.632 | TFLOPs: 31.41 | 7: iteration 97560/ 115203 | consumed samples: 24975360 | consumed tokens: 51149537280 | elapsed time per iteration (s): 0.43 | learning rate: 3.042E-05 | global batch size: 256 | lm loss: 2.203855E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.965 | TFLOPs: 31.58 | 7: iteration 97570/ 115203 | consumed samples: 24977920 | consumed tokens: 51154780160 | elapsed time per iteration (s): 0.43 | learning rate: 3.041E-05 | global batch size: 256 | lm loss: 2.239866E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.248 | TFLOPs: 31.49 | 7: iteration 97580/ 115203 | consumed samples: 24980480 | consumed tokens: 51160023040 | elapsed time per iteration (s): 0.43 | learning rate: 3.040E-05 | global batch size: 256 | lm loss: 2.224529E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.883 | TFLOPs: 31.42 | 7: iteration 97590/ 115203 | consumed samples: 24983040 | consumed tokens: 51165265920 | elapsed time per iteration (s): 0.42 | learning rate: 3.039E-05 | global batch size: 256 | lm loss: 2.223743E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.079 | TFLOPs: 31.70 | 7: iteration 97600/ 115203 | consumed samples: 24985600 | consumed tokens: 51170508800 | elapsed time per iteration (s): 0.43 | learning rate: 3.038E-05 | global batch size: 256 | lm loss: 2.224002E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.725 | TFLOPs: 31.41 | 7: iteration 97610/ 115203 | consumed samples: 24988160 | consumed tokens: 51175751680 | elapsed time per iteration (s): 0.43 | learning rate: 3.036E-05 | global batch size: 256 | lm loss: 2.211080E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.061 | TFLOPs: 31.54 | 7: iteration 97620/ 115203 | consumed samples: 24990720 | consumed tokens: 51180994560 | elapsed time per iteration (s): 0.43 | learning rate: 3.035E-05 | global batch size: 256 | lm loss: 2.267005E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.640 | TFLOPs: 31.41 | 7: iteration 97630/ 115203 | consumed samples: 24993280 | consumed tokens: 51186237440 | elapsed time per iteration (s): 0.42 | learning rate: 3.034E-05 | global batch size: 256 | lm loss: 2.255383E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.428 | TFLOPs: 31.77 | 7: iteration 97640/ 115203 | consumed samples: 24995840 | consumed tokens: 51191480320 | elapsed time per iteration (s): 0.44 | learning rate: 3.033E-05 | global batch size: 256 | lm loss: 2.235376E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.316 | TFLOPs: 30.40 | 7: iteration 97650/ 115203 | consumed samples: 24998400 | consumed tokens: 51196723200 | elapsed time per iteration (s): 0.43 | learning rate: 3.032E-05 | global batch size: 256 | lm loss: 2.236654E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.285 | TFLOPs: 30.97 | 7: iteration 97660/ 115203 | consumed samples: 25000960 | consumed tokens: 51201966080 | elapsed time per iteration (s): 0.43 | learning rate: 3.031E-05 | global batch size: 256 | lm loss: 2.254743E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.811 | TFLOPs: 31.31 | 7: iteration 97670/ 115203 | consumed samples: 25003520 | consumed tokens: 51207208960 | elapsed time per iteration (s): 0.44 | learning rate: 3.029E-05 | global batch size: 256 | lm loss: 2.235400E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.365 | TFLOPs: 30.45 | 7: iteration 97680/ 115203 | consumed samples: 25006080 | consumed tokens: 51212451840 | elapsed time per iteration (s): 0.43 | learning rate: 3.028E-05 | global batch size: 256 | lm loss: 2.235876E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.116 | TFLOPs: 31.59 | 7: iteration 97690/ 115203 | consumed samples: 25008640 | consumed tokens: 51217694720 | elapsed time per iteration (s): 0.43 | learning rate: 3.027E-05 | global batch size: 256 | lm loss: 2.222968E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.303 | TFLOPs: 30.97 | 7: iteration 97700/ 115203 | consumed samples: 25011200 | consumed tokens: 51222937600 | elapsed time per iteration (s): 0.42 | learning rate: 3.026E-05 | global batch size: 256 | lm loss: 2.262897E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.188 | TFLOPs: 31.70 | 7: iteration 97710/ 115203 | consumed samples: 25013760 | consumed tokens: 51228180480 | elapsed time per iteration (s): 0.43 | learning rate: 3.025E-05 | global batch size: 256 | lm loss: 2.206676E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.088 | TFLOPs: 31.01 | 7: iteration 97720/ 115203 | consumed samples: 25016320 | consumed tokens: 51233423360 | elapsed time per iteration (s): 0.43 | learning rate: 3.024E-05 | global batch size: 256 | lm loss: 2.235758E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.671 | TFLOPs: 31.52 | 7: iteration 97730/ 115203 | consumed samples: 25018880 | consumed tokens: 51238666240 | elapsed time per iteration (s): 0.43 | learning rate: 3.023E-05 | global batch size: 256 | lm loss: 2.208562E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.334 | TFLOPs: 31.60 | 7: iteration 97740/ 115203 | consumed samples: 25021440 | consumed tokens: 51243909120 | elapsed time per iteration (s): 0.43 | learning rate: 3.021E-05 | global batch size: 256 | lm loss: 2.206259E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.685 | TFLOPs: 30.99 | 7: iteration 97750/ 115203 | consumed samples: 25024000 | consumed tokens: 51249152000 | elapsed time per iteration (s): 0.46 | learning rate: 3.020E-05 | global batch size: 256 | lm loss: 2.243078E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.585 | TFLOPs: 29.31 | 7: iteration 97760/ 115203 | consumed samples: 25026560 | consumed tokens: 51254394880 | elapsed time per iteration (s): 0.43 | learning rate: 3.019E-05 | global batch size: 256 | lm loss: 2.236750E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.105 | TFLOPs: 31.38 | 7: iteration 97770/ 115203 | consumed samples: 25029120 | consumed tokens: 51259637760 | elapsed time per iteration (s): 0.44 | learning rate: 3.018E-05 | global batch size: 256 | lm loss: 2.238346E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.038 | TFLOPs: 30.85 | 7: iteration 97780/ 115203 | consumed samples: 25031680 | consumed tokens: 51264880640 | elapsed time per iteration (s): 0.43 | learning rate: 3.017E-05 | global batch size: 256 | lm loss: 2.215079E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.000 | TFLOPs: 31.43 | 7: iteration 97790/ 115203 | consumed samples: 25034240 | consumed tokens: 51270123520 | elapsed time per iteration (s): 0.66 | learning rate: 3.016E-05 | global batch size: 256 | lm loss: 2.237395E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 387.497 | TFLOPs: 20.33 | 7: iteration 97800/ 115203 | consumed samples: 25036800 | consumed tokens: 51275366400 | elapsed time per iteration (s): 0.44 | learning rate: 3.015E-05 | global batch size: 256 | lm loss: 2.215532E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.888 | TFLOPs: 30.79 | 7: iteration 97810/ 115203 | consumed samples: 25039360 | consumed tokens: 51280609280 | elapsed time per iteration (s): 0.42 | learning rate: 3.013E-05 | global batch size: 256 | lm loss: 2.241800E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.991 | TFLOPs: 31.74 | 7: iteration 97820/ 115203 | consumed samples: 25041920 | consumed tokens: 51285852160 | elapsed time per iteration (s): 0.44 | learning rate: 3.012E-05 | global batch size: 256 | lm loss: 2.215650E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.160 | TFLOPs: 30.70 | 7: iteration 97830/ 115203 | consumed samples: 25044480 | consumed tokens: 51291095040 | elapsed time per iteration (s): 0.42 | learning rate: 3.011E-05 | global batch size: 256 | lm loss: 2.222373E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.356 | TFLOPs: 31.71 | 7: iteration 97840/ 115203 | consumed samples: 25047040 | consumed tokens: 51296337920 | elapsed time per iteration (s): 0.42 | learning rate: 3.010E-05 | global batch size: 256 | lm loss: 2.232460E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.325 | TFLOPs: 31.97 | 7: iteration 97850/ 115203 | consumed samples: 25049600 | consumed tokens: 51301580800 | elapsed time per iteration (s): 0.43 | learning rate: 3.009E-05 | global batch size: 256 | lm loss: 2.227425E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.532 | TFLOPs: 31.30 | 7: iteration 97860/ 115203 | consumed samples: 25052160 | consumed tokens: 51306823680 | elapsed time per iteration (s): 0.43 | learning rate: 3.008E-05 | global batch size: 256 | lm loss: 2.221748E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.752 | TFLOPs: 31.15 | 7: iteration 97870/ 115203 | consumed samples: 25054720 | consumed tokens: 51312066560 | elapsed time per iteration (s): 0.43 | learning rate: 3.007E-05 | global batch size: 256 | lm loss: 2.254335E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.908 | TFLOPs: 31.21 | 7: iteration 97880/ 115203 | consumed samples: 25057280 | consumed tokens: 51317309440 | elapsed time per iteration (s): 0.68 | learning rate: 3.005E-05 | global batch size: 256 | lm loss: 2.207210E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 378.990 | TFLOPs: 19.88 | 7: iteration 97890/ 115203 | consumed samples: 25059840 | consumed tokens: 51322552320 | elapsed time per iteration (s): 0.43 | learning rate: 3.004E-05 | global batch size: 256 | lm loss: 2.219269E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.041 | TFLOPs: 31.33 | 7: iteration 97900/ 115203 | consumed samples: 25062400 | consumed tokens: 51327795200 | elapsed time per iteration (s): 0.44 | learning rate: 3.003E-05 | global batch size: 256 | lm loss: 2.208642E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.992 | TFLOPs: 30.80 | 7: iteration 97910/ 115203 | consumed samples: 25064960 | consumed tokens: 51333038080 | elapsed time per iteration (s): 0.44 | learning rate: 3.002E-05 | global batch size: 256 | lm loss: 2.232592E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.784 | TFLOPs: 30.84 | 7: iteration 97920/ 115203 | consumed samples: 25067520 | consumed tokens: 51338280960 | elapsed time per iteration (s): 0.43 | learning rate: 3.001E-05 | global batch size: 256 | lm loss: 2.222382E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.148 | TFLOPs: 31.49 | 7: iteration 97930/ 115203 | consumed samples: 25070080 | consumed tokens: 51343523840 | elapsed time per iteration (s): 0.43 | learning rate: 3.000E-05 | global batch size: 256 | lm loss: 2.240832E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.100 | TFLOPs: 31.49 | 7: iteration 97940/ 115203 | consumed samples: 25072640 | consumed tokens: 51348766720 | elapsed time per iteration (s): 0.43 | learning rate: 2.999E-05 | global batch size: 256 | lm loss: 2.241635E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.331 | TFLOPs: 31.45 | 7: iteration 97950/ 115203 | consumed samples: 25075200 | consumed tokens: 51354009600 | elapsed time per iteration (s): 0.43 | learning rate: 2.997E-05 | global batch size: 256 | lm loss: 2.212891E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.646 | TFLOPs: 31.41 | 7: iteration 97960/ 115203 | consumed samples: 25077760 | consumed tokens: 51359252480 | elapsed time per iteration (s): 0.43 | learning rate: 2.996E-05 | global batch size: 256 | lm loss: 2.257449E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.658 | TFLOPs: 31.41 | 7: iteration 97970/ 115203 | consumed samples: 25080320 | consumed tokens: 51364495360 | elapsed time per iteration (s): 0.42 | learning rate: 2.995E-05 | global batch size: 256 | lm loss: 2.219669E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.641 | TFLOPs: 31.67 | 7: iteration 97980/ 115203 | consumed samples: 25082880 | consumed tokens: 51369738240 | elapsed time per iteration (s): 0.44 | learning rate: 2.994E-05 | global batch size: 256 | lm loss: 2.251142E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.464 | TFLOPs: 30.82 | 7: iteration 97990/ 115203 | consumed samples: 25085440 | consumed tokens: 51374981120 | elapsed time per iteration (s): 0.43 | learning rate: 2.993E-05 | global batch size: 256 | lm loss: 2.222410E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.036 | TFLOPs: 31.59 | 0: [2022-11-29 00:46:19,238] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=0, lr=[2.9917836598254863e-05, 2.9917836598254863e-05, 2.9917836598254863e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 98000/ 115203 | consumed samples: 25088000 | consumed tokens: 51380224000 | elapsed time per iteration (s): 0.47 | learning rate: 2.992E-05 | global batch size: 256 | lm loss: 2.192985E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 545.627 | TFLOPs: 28.63 | 0: steps: 98000 loss: 2.1472 iter time (s): 0.434 samples/sec: 590.133 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 98000 | lm loss value: 2.192342E+00 | lm loss PPL: 8.956162E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 98000 to checkpoints_221m 0: [2022-11-29 00:46:19,547] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step98000 is begin to save! 0: [2022-11-29 00:46:19,553] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:46:19,664] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:46:19,664] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:46:19,686] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:46:19,686] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:46:19,709] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:46:19,709] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:46:19,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:46:19,734] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:46:19,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:46:19,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:46:19,784] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:46:19,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:46:19,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:46:19,809] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:46:19,833] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:46:19,833] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:46:19,858] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:46:19,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:46:19,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:46:19,882] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:46:19,912] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:46:19,913] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:46:19,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:46:19,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:46:19,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:46:19,960] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:46:19,988] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:46:19,988] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:46:20,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:46:20,010] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:46:20,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:46:20,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:46:20,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:46:20,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:46:20,082] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:46:20,082] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:46:20,106] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:46:20,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:46:20,111] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:46:20,112] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step98000/mp_rank_00_model_states.pt 0: [2022-11-29 00:46:20,112] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:46:20,114] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:46:20,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:46:20,179] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,179] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,179] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,185] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,185] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,190] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,190] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,190] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,194] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,194] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,194] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,195] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,195] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,195] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,186] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,186] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,196] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,196] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,196] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,187] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:46:20,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 3: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,189] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,189] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 5: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:46:20,197] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:46:20,197] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:46:20,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 6: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:46:20,200] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:46:20,200] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,201] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,201] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,201] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,205] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,205] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,205] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,206] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,206] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,206] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,211] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,211] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 7: [2022-11-29 00:46:20,213] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:46:20,213] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:46:20,213] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: [2022-11-29 00:46:20,233] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:46:20,233] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 2: [2022-11-29 00:46:20,273] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:46:20,274] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 00:46:20,274] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:46:20,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 1: [2022-11-29 00:46:20,279] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:46:20,279] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,280] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,280] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:46:20,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,283] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step98000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 4: [2022-11-29 00:46:20,283] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step98000 is ready now! 0: successfully saved checkpoint at iteration 98000 to checkpoints_221m 7: time (ms) | save-checkpoint: 755.85 7: iteration 98010/ 115203 | consumed samples: 25090560 | consumed tokens: 51385466880 | elapsed time per iteration (s): 0.54 | learning rate: 2.991E-05 | global batch size: 256 | lm loss: 2.249775E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 476.799 | TFLOPs: 25.02 | 7: iteration 98020/ 115203 | consumed samples: 25093120 | consumed tokens: 51390709760 | elapsed time per iteration (s): 0.43 | learning rate: 2.990E-05 | global batch size: 256 | lm loss: 2.180065E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.710 | TFLOPs: 30.89 | 7: iteration 98030/ 115203 | consumed samples: 25095680 | consumed tokens: 51395952640 | elapsed time per iteration (s): 0.43 | learning rate: 2.988E-05 | global batch size: 256 | lm loss: 2.243063E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.558 | TFLOPs: 31.35 | 7: iteration 98040/ 115203 | consumed samples: 25098240 | consumed tokens: 51401195520 | elapsed time per iteration (s): 0.43 | learning rate: 2.987E-05 | global batch size: 256 | lm loss: 2.238460E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.138 | TFLOPs: 31.38 | 7: iteration 98050/ 115203 | consumed samples: 25100800 | consumed tokens: 51406438400 | elapsed time per iteration (s): 0.42 | learning rate: 2.986E-05 | global batch size: 256 | lm loss: 2.242221E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.392 | TFLOPs: 31.92 | 7: iteration 98060/ 115203 | consumed samples: 25103360 | consumed tokens: 51411681280 | elapsed time per iteration (s): 0.42 | learning rate: 2.985E-05 | global batch size: 256 | lm loss: 2.231311E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.892 | TFLOPs: 31.84 | 7: iteration 98070/ 115203 | consumed samples: 25105920 | consumed tokens: 51416924160 | elapsed time per iteration (s): 0.42 | learning rate: 2.984E-05 | global batch size: 256 | lm loss: 2.260043E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.065 | TFLOPs: 31.69 | 7: iteration 98080/ 115203 | consumed samples: 25108480 | consumed tokens: 51422167040 | elapsed time per iteration (s): 0.43 | learning rate: 2.983E-05 | global batch size: 256 | lm loss: 2.203087E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.128 | TFLOPs: 31.38 | 7: iteration 98090/ 115203 | consumed samples: 25111040 | consumed tokens: 51427409920 | elapsed time per iteration (s): 0.42 | learning rate: 2.982E-05 | global batch size: 256 | lm loss: 2.243219E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.474 | TFLOPs: 31.82 | 7: iteration 98100/ 115203 | consumed samples: 25113600 | consumed tokens: 51432652800 | elapsed time per iteration (s): 0.43 | learning rate: 2.981E-05 | global batch size: 256 | lm loss: 2.215381E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.857 | TFLOPs: 31.37 | 7: iteration 98110/ 115203 | consumed samples: 25116160 | consumed tokens: 51437895680 | elapsed time per iteration (s): 0.43 | learning rate: 2.979E-05 | global batch size: 256 | lm loss: 2.236855E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.762 | TFLOPs: 31.31 | 7: iteration 98120/ 115203 | consumed samples: 25118720 | consumed tokens: 51443138560 | elapsed time per iteration (s): 0.43 | learning rate: 2.978E-05 | global batch size: 256 | lm loss: 2.217303E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.211 | TFLOPs: 31.44 | 7: iteration 98130/ 115203 | consumed samples: 25121280 | consumed tokens: 51448381440 | elapsed time per iteration (s): 0.43 | learning rate: 2.977E-05 | global batch size: 256 | lm loss: 2.208471E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.387 | TFLOPs: 31.19 | 7: iteration 98140/ 115203 | consumed samples: 25123840 | consumed tokens: 51453624320 | elapsed time per iteration (s): 0.43 | learning rate: 2.976E-05 | global batch size: 256 | lm loss: 2.220659E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.299 | TFLOPs: 31.08 | 7: iteration 98150/ 115203 | consumed samples: 25126400 | consumed tokens: 51458867200 | elapsed time per iteration (s): 0.44 | learning rate: 2.975E-05 | global batch size: 256 | lm loss: 2.226117E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.133 | TFLOPs: 30.70 | 7: iteration 98160/ 115203 | consumed samples: 25128960 | consumed tokens: 51464110080 | elapsed time per iteration (s): 0.43 | learning rate: 2.974E-05 | global batch size: 256 | lm loss: 2.205798E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.037 | TFLOPs: 31.43 | 7: iteration 98170/ 115203 | consumed samples: 25131520 | consumed tokens: 51469352960 | elapsed time per iteration (s): 0.43 | learning rate: 2.973E-05 | global batch size: 256 | lm loss: 2.228125E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.133 | TFLOPs: 31.23 | 7: iteration 98180/ 115203 | consumed samples: 25134080 | consumed tokens: 51474595840 | elapsed time per iteration (s): 0.42 | learning rate: 2.972E-05 | global batch size: 256 | lm loss: 2.248970E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.064 | TFLOPs: 31.64 | 7: iteration 98190/ 115203 | consumed samples: 25136640 | consumed tokens: 51479838720 | elapsed time per iteration (s): 0.43 | learning rate: 2.970E-05 | global batch size: 256 | lm loss: 2.225050E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.599 | TFLOPs: 31.30 | 7: iteration 98200/ 115203 | consumed samples: 25139200 | consumed tokens: 51485081600 | elapsed time per iteration (s): 0.44 | learning rate: 2.969E-05 | global batch size: 256 | lm loss: 2.236659E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.523 | TFLOPs: 30.77 | 7: iteration 98210/ 115203 | consumed samples: 25141760 | consumed tokens: 51490324480 | elapsed time per iteration (s): 0.43 | learning rate: 2.968E-05 | global batch size: 256 | lm loss: 2.233371E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.412 | TFLOPs: 31.45 | 7: iteration 98220/ 115203 | consumed samples: 25144320 | consumed tokens: 51495567360 | elapsed time per iteration (s): 0.44 | learning rate: 2.967E-05 | global batch size: 256 | lm loss: 2.214544E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.569 | TFLOPs: 30.57 | 7: iteration 98230/ 115203 | consumed samples: 25146880 | consumed tokens: 51500810240 | elapsed time per iteration (s): 0.43 | learning rate: 2.966E-05 | global batch size: 256 | lm loss: 2.233940E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.922 | TFLOPs: 31.53 | 7: iteration 98240/ 115203 | consumed samples: 25149440 | consumed tokens: 51506053120 | elapsed time per iteration (s): 0.43 | learning rate: 2.965E-05 | global batch size: 256 | lm loss: 2.214585E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.244 | TFLOPs: 31.13 | 7: iteration 98250/ 115203 | consumed samples: 25152000 | consumed tokens: 51511296000 | elapsed time per iteration (s): 0.43 | learning rate: 2.964E-05 | global batch size: 256 | lm loss: 2.219565E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.230 | TFLOPs: 31.60 | 7: iteration 98260/ 115203 | consumed samples: 25154560 | consumed tokens: 51516538880 | elapsed time per iteration (s): 0.44 | learning rate: 2.963E-05 | global batch size: 256 | lm loss: 2.247390E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.328 | TFLOPs: 30.55 | 7: iteration 98270/ 115203 | consumed samples: 25157120 | consumed tokens: 51521781760 | elapsed time per iteration (s): 0.42 | learning rate: 2.961E-05 | global batch size: 256 | lm loss: 2.219235E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.079 | TFLOPs: 31.75 | 7: iteration 98280/ 115203 | consumed samples: 25159680 | consumed tokens: 51527024640 | elapsed time per iteration (s): 0.43 | learning rate: 2.960E-05 | global batch size: 256 | lm loss: 2.255793E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.201 | TFLOPs: 31.60 | 7: iteration 98290/ 115203 | consumed samples: 25162240 | consumed tokens: 51532267520 | elapsed time per iteration (s): 0.42 | learning rate: 2.959E-05 | global batch size: 256 | lm loss: 2.199660E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.949 | TFLOPs: 32.06 | 7: iteration 98300/ 115203 | consumed samples: 25164800 | consumed tokens: 51537510400 | elapsed time per iteration (s): 0.43 | learning rate: 2.958E-05 | global batch size: 256 | lm loss: 2.204338E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.767 | TFLOPs: 31.31 | 7: iteration 98310/ 115203 | consumed samples: 25167360 | consumed tokens: 51542753280 | elapsed time per iteration (s): 0.43 | learning rate: 2.957E-05 | global batch size: 256 | lm loss: 2.237964E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.230 | TFLOPs: 31.18 | 7: iteration 98320/ 115203 | consumed samples: 25169920 | consumed tokens: 51547996160 | elapsed time per iteration (s): 0.46 | learning rate: 2.956E-05 | global batch size: 256 | lm loss: 2.221998E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.348 | TFLOPs: 28.98 | 7: iteration 98330/ 115203 | consumed samples: 25172480 | consumed tokens: 51553239040 | elapsed time per iteration (s): 0.44 | learning rate: 2.955E-05 | global batch size: 256 | lm loss: 2.235561E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.241 | TFLOPs: 30.39 | 7: iteration 98340/ 115203 | consumed samples: 25175040 | consumed tokens: 51558481920 | elapsed time per iteration (s): 0.43 | learning rate: 2.954E-05 | global batch size: 256 | lm loss: 2.205852E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.085 | TFLOPs: 30.96 | 7: iteration 98350/ 115203 | consumed samples: 25177600 | consumed tokens: 51563724800 | elapsed time per iteration (s): 0.44 | learning rate: 2.953E-05 | global batch size: 256 | lm loss: 2.229820E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.481 | TFLOPs: 30.88 | 7: iteration 98360/ 115203 | consumed samples: 25180160 | consumed tokens: 51568967680 | elapsed time per iteration (s): 0.42 | learning rate: 2.951E-05 | global batch size: 256 | lm loss: 2.223687E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.867 | TFLOPs: 31.84 | 7: iteration 98370/ 115203 | consumed samples: 25182720 | consumed tokens: 51574210560 | elapsed time per iteration (s): 0.43 | learning rate: 2.950E-05 | global batch size: 256 | lm loss: 2.221117E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.753 | TFLOPs: 31.26 | 7: iteration 98380/ 115203 | consumed samples: 25185280 | consumed tokens: 51579453440 | elapsed time per iteration (s): 0.42 | learning rate: 2.949E-05 | global batch size: 256 | lm loss: 2.262995E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.813 | TFLOPs: 31.84 | 7: iteration 98390/ 115203 | consumed samples: 25187840 | consumed tokens: 51584696320 | elapsed time per iteration (s): 1.31 | learning rate: 2.948E-05 | global batch size: 256 | lm loss: 2.222404E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 195.985 | TFLOPs: 10.28 | 7: iteration 98400/ 115203 | consumed samples: 25190400 | consumed tokens: 51589939200 | elapsed time per iteration (s): 0.43 | learning rate: 2.947E-05 | global batch size: 256 | lm loss: 2.226432E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.418 | TFLOPs: 31.40 | 7: iteration 98410/ 115203 | consumed samples: 25192960 | consumed tokens: 51595182080 | elapsed time per iteration (s): 0.93 | learning rate: 2.946E-05 | global batch size: 256 | lm loss: 2.202384E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 274.983 | TFLOPs: 14.43 | 7: iteration 98420/ 115203 | consumed samples: 25195520 | consumed tokens: 51600424960 | elapsed time per iteration (s): 0.43 | learning rate: 2.945E-05 | global batch size: 256 | lm loss: 2.252065E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.590 | TFLOPs: 31.09 | 7: iteration 98430/ 115203 | consumed samples: 25198080 | consumed tokens: 51605667840 | elapsed time per iteration (s): 0.44 | learning rate: 2.944E-05 | global batch size: 256 | lm loss: 2.197590E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.352 | TFLOPs: 30.45 | 7: iteration 98440/ 115203 | consumed samples: 25200640 | consumed tokens: 51610910720 | elapsed time per iteration (s): 0.42 | learning rate: 2.943E-05 | global batch size: 256 | lm loss: 2.244348E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.790 | TFLOPs: 31.84 | 7: iteration 98450/ 115203 | consumed samples: 25203200 | consumed tokens: 51616153600 | elapsed time per iteration (s): 0.44 | learning rate: 2.941E-05 | global batch size: 256 | lm loss: 2.232446E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.900 | TFLOPs: 30.74 | 7: iteration 98460/ 115203 | consumed samples: 25205760 | consumed tokens: 51621396480 | elapsed time per iteration (s): 0.43 | learning rate: 2.940E-05 | global batch size: 256 | lm loss: 2.199615E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.320 | TFLOPs: 31.29 | 7: iteration 98470/ 115203 | consumed samples: 25208320 | consumed tokens: 51626639360 | elapsed time per iteration (s): 0.44 | learning rate: 2.939E-05 | global batch size: 256 | lm loss: 2.232928E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.400 | TFLOPs: 30.61 | 7: iteration 98480/ 115203 | consumed samples: 25210880 | consumed tokens: 51631882240 | elapsed time per iteration (s): 0.45 | learning rate: 2.938E-05 | global batch size: 256 | lm loss: 2.253607E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.065 | TFLOPs: 30.17 | 7: iteration 98490/ 115203 | consumed samples: 25213440 | consumed tokens: 51637125120 | elapsed time per iteration (s): 0.44 | learning rate: 2.937E-05 | global batch size: 256 | lm loss: 2.238712E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.879 | TFLOPs: 30.53 | 7: iteration 98500/ 115203 | consumed samples: 25216000 | consumed tokens: 51642368000 | elapsed time per iteration (s): 0.43 | learning rate: 2.936E-05 | global batch size: 256 | lm loss: 2.236832E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.443 | TFLOPs: 31.08 | 7: iteration 98510/ 115203 | consumed samples: 25218560 | consumed tokens: 51647610880 | elapsed time per iteration (s): 0.44 | learning rate: 2.935E-05 | global batch size: 256 | lm loss: 2.214224E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.844 | TFLOPs: 30.69 | 7: iteration 98520/ 115203 | consumed samples: 25221120 | consumed tokens: 51652853760 | elapsed time per iteration (s): 0.44 | learning rate: 2.934E-05 | global batch size: 256 | lm loss: 2.209281E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.278 | TFLOPs: 30.39 | 7: iteration 98530/ 115203 | consumed samples: 25223680 | consumed tokens: 51658096640 | elapsed time per iteration (s): 0.45 | learning rate: 2.933E-05 | global batch size: 256 | lm loss: 2.226863E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.900 | TFLOPs: 29.69 | 7: iteration 98540/ 115203 | consumed samples: 25226240 | consumed tokens: 51663339520 | elapsed time per iteration (s): 0.44 | learning rate: 2.932E-05 | global batch size: 256 | lm loss: 2.204803E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.829 | TFLOPs: 30.69 | 7: iteration 98550/ 115203 | consumed samples: 25228800 | consumed tokens: 51668582400 | elapsed time per iteration (s): 0.44 | learning rate: 2.930E-05 | global batch size: 256 | lm loss: 2.227758E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.465 | TFLOPs: 30.30 | 7: iteration 98560/ 115203 | consumed samples: 25231360 | consumed tokens: 51673825280 | elapsed time per iteration (s): 0.44 | learning rate: 2.929E-05 | global batch size: 256 | lm loss: 2.217221E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.349 | TFLOPs: 30.40 | 7: iteration 98570/ 115203 | consumed samples: 25233920 | consumed tokens: 51679068160 | elapsed time per iteration (s): 0.44 | learning rate: 2.928E-05 | global batch size: 256 | lm loss: 2.262464E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.848 | TFLOPs: 30.69 | 7: iteration 98580/ 115203 | consumed samples: 25236480 | consumed tokens: 51684311040 | elapsed time per iteration (s): 0.43 | learning rate: 2.927E-05 | global batch size: 256 | lm loss: 2.207105E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.762 | TFLOPs: 31.26 | 7: iteration 98590/ 115203 | consumed samples: 25239040 | consumed tokens: 51689553920 | elapsed time per iteration (s): 0.44 | learning rate: 2.926E-05 | global batch size: 256 | lm loss: 2.189976E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.242 | TFLOPs: 30.86 | 7: iteration 98600/ 115203 | consumed samples: 25241600 | consumed tokens: 51694796800 | elapsed time per iteration (s): 0.42 | learning rate: 2.925E-05 | global batch size: 256 | lm loss: 2.238940E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.419 | TFLOPs: 31.77 | 7: iteration 98610/ 115203 | consumed samples: 25244160 | consumed tokens: 51700039680 | elapsed time per iteration (s): 0.43 | learning rate: 2.924E-05 | global batch size: 256 | lm loss: 2.208040E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.356 | TFLOPs: 31.45 | 7: iteration 98620/ 115203 | consumed samples: 25246720 | consumed tokens: 51705282560 | elapsed time per iteration (s): 0.43 | learning rate: 2.923E-05 | global batch size: 256 | lm loss: 2.224987E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.178 | TFLOPs: 31.28 | 7: iteration 98630/ 115203 | consumed samples: 25249280 | consumed tokens: 51710525440 | elapsed time per iteration (s): 0.43 | learning rate: 2.922E-05 | global batch size: 256 | lm loss: 2.225832E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.776 | TFLOPs: 31.47 | 7: iteration 98640/ 115203 | consumed samples: 25251840 | consumed tokens: 51715768320 | elapsed time per iteration (s): 0.44 | learning rate: 2.921E-05 | global batch size: 256 | lm loss: 2.189518E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.922 | TFLOPs: 30.53 | 7: iteration 98650/ 115203 | consumed samples: 25254400 | consumed tokens: 51721011200 | elapsed time per iteration (s): 0.43 | learning rate: 2.920E-05 | global batch size: 256 | lm loss: 2.204214E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.737 | TFLOPs: 31.52 | 7: iteration 98660/ 115203 | consumed samples: 25256960 | consumed tokens: 51726254080 | elapsed time per iteration (s): 0.42 | learning rate: 2.918E-05 | global batch size: 256 | lm loss: 2.200585E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.752 | TFLOPs: 31.73 | 7: iteration 98670/ 115203 | consumed samples: 25259520 | consumed tokens: 51731496960 | elapsed time per iteration (s): 0.45 | learning rate: 2.917E-05 | global batch size: 256 | lm loss: 2.252357E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.830 | TFLOPs: 29.90 | 7: iteration 98680/ 115203 | consumed samples: 25262080 | consumed tokens: 51736739840 | elapsed time per iteration (s): 0.45 | learning rate: 2.916E-05 | global batch size: 256 | lm loss: 2.215782E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.195 | TFLOPs: 30.13 | 7: iteration 98690/ 115203 | consumed samples: 25264640 | consumed tokens: 51741982720 | elapsed time per iteration (s): 0.43 | learning rate: 2.915E-05 | global batch size: 256 | lm loss: 2.213478E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.170 | TFLOPs: 31.49 | 7: iteration 98700/ 115203 | consumed samples: 25267200 | consumed tokens: 51747225600 | elapsed time per iteration (s): 0.45 | learning rate: 2.914E-05 | global batch size: 256 | lm loss: 2.210876E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.309 | TFLOPs: 29.56 | 7: iteration 98710/ 115203 | consumed samples: 25269760 | consumed tokens: 51752468480 | elapsed time per iteration (s): 0.45 | learning rate: 2.913E-05 | global batch size: 256 | lm loss: 2.232433E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.624 | TFLOPs: 29.68 | 7: iteration 98720/ 115203 | consumed samples: 25272320 | consumed tokens: 51757711360 | elapsed time per iteration (s): 0.43 | learning rate: 2.912E-05 | global batch size: 256 | lm loss: 2.209387E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.682 | TFLOPs: 31.25 | 7: iteration 98730/ 115203 | consumed samples: 25274880 | consumed tokens: 51762954240 | elapsed time per iteration (s): 0.43 | learning rate: 2.911E-05 | global batch size: 256 | lm loss: 2.192152E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.960 | TFLOPs: 31.16 | 7: iteration 98740/ 115203 | consumed samples: 25277440 | consumed tokens: 51768197120 | elapsed time per iteration (s): 0.43 | learning rate: 2.910E-05 | global batch size: 256 | lm loss: 2.196885E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.021 | TFLOPs: 31.59 | 7: iteration 98750/ 115203 | consumed samples: 25280000 | consumed tokens: 51773440000 | elapsed time per iteration (s): 0.43 | learning rate: 2.909E-05 | global batch size: 256 | lm loss: 2.233443E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.671 | TFLOPs: 30.89 | 7: iteration 98760/ 115203 | consumed samples: 25282560 | consumed tokens: 51778682880 | elapsed time per iteration (s): 0.44 | learning rate: 2.908E-05 | global batch size: 256 | lm loss: 2.194149E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.232 | TFLOPs: 30.65 | 7: iteration 98770/ 115203 | consumed samples: 25285120 | consumed tokens: 51783925760 | elapsed time per iteration (s): 0.47 | learning rate: 2.906E-05 | global batch size: 256 | lm loss: 2.235606E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.023 | TFLOPs: 28.44 | 7: iteration 98780/ 115203 | consumed samples: 25287680 | consumed tokens: 51789168640 | elapsed time per iteration (s): 0.47 | learning rate: 2.905E-05 | global batch size: 256 | lm loss: 2.207304E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 546.889 | TFLOPs: 28.69 | 7: iteration 98790/ 115203 | consumed samples: 25290240 | consumed tokens: 51794411520 | elapsed time per iteration (s): 0.45 | learning rate: 2.904E-05 | global batch size: 256 | lm loss: 2.225989E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.795 | TFLOPs: 30.11 | 7: iteration 98800/ 115203 | consumed samples: 25292800 | consumed tokens: 51799654400 | elapsed time per iteration (s): 0.43 | learning rate: 2.903E-05 | global batch size: 256 | lm loss: 2.222682E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.647 | TFLOPs: 31.31 | 7: iteration 98810/ 115203 | consumed samples: 25295360 | consumed tokens: 51804897280 | elapsed time per iteration (s): 0.43 | learning rate: 2.902E-05 | global batch size: 256 | lm loss: 2.235547E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.657 | TFLOPs: 31.36 | 7: iteration 98820/ 115203 | consumed samples: 25297920 | consumed tokens: 51810140160 | elapsed time per iteration (s): 0.43 | learning rate: 2.901E-05 | global batch size: 256 | lm loss: 2.257070E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.242 | TFLOPs: 31.39 | 7: iteration 98830/ 115203 | consumed samples: 25300480 | consumed tokens: 51815383040 | elapsed time per iteration (s): 0.43 | learning rate: 2.900E-05 | global batch size: 256 | lm loss: 2.202739E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.800 | TFLOPs: 30.95 | 7: iteration 98840/ 115203 | consumed samples: 25303040 | consumed tokens: 51820625920 | elapsed time per iteration (s): 0.44 | learning rate: 2.899E-05 | global batch size: 256 | lm loss: 2.247826E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.583 | TFLOPs: 30.46 | 7: iteration 98850/ 115203 | consumed samples: 25305600 | consumed tokens: 51825868800 | elapsed time per iteration (s): 0.43 | learning rate: 2.898E-05 | global batch size: 256 | lm loss: 2.266020E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.366 | TFLOPs: 31.55 | 7: iteration 98860/ 115203 | consumed samples: 25308160 | consumed tokens: 51831111680 | elapsed time per iteration (s): 0.44 | learning rate: 2.897E-05 | global batch size: 256 | lm loss: 2.239850E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.970 | TFLOPs: 30.80 | 7: iteration 98870/ 115203 | consumed samples: 25310720 | consumed tokens: 51836354560 | elapsed time per iteration (s): 0.44 | learning rate: 2.896E-05 | global batch size: 256 | lm loss: 2.191268E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.264 | TFLOPs: 30.71 | 7: iteration 98880/ 115203 | consumed samples: 25313280 | consumed tokens: 51841597440 | elapsed time per iteration (s): 0.42 | learning rate: 2.895E-05 | global batch size: 256 | lm loss: 2.233314E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.208 | TFLOPs: 31.86 | 7: iteration 98890/ 115203 | consumed samples: 25315840 | consumed tokens: 51846840320 | elapsed time per iteration (s): 0.43 | learning rate: 2.894E-05 | global batch size: 256 | lm loss: 2.251011E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.465 | TFLOPs: 31.14 | 7: iteration 98900/ 115203 | consumed samples: 25318400 | consumed tokens: 51852083200 | elapsed time per iteration (s): 0.43 | learning rate: 2.892E-05 | global batch size: 256 | lm loss: 2.224659E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.922 | TFLOPs: 31.27 | 7: iteration 98910/ 115203 | consumed samples: 25320960 | consumed tokens: 51857326080 | elapsed time per iteration (s): 0.45 | learning rate: 2.891E-05 | global batch size: 256 | lm loss: 2.238969E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.506 | TFLOPs: 29.72 | 7: iteration 98920/ 115203 | consumed samples: 25323520 | consumed tokens: 51862568960 | elapsed time per iteration (s): 0.44 | learning rate: 2.890E-05 | global batch size: 256 | lm loss: 2.288104E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.830 | TFLOPs: 30.84 | 7: iteration 98930/ 115203 | consumed samples: 25326080 | consumed tokens: 51867811840 | elapsed time per iteration (s): 0.43 | learning rate: 2.889E-05 | global batch size: 256 | lm loss: 2.210519E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.621 | TFLOPs: 30.99 | 7: iteration 98940/ 115203 | consumed samples: 25328640 | consumed tokens: 51873054720 | elapsed time per iteration (s): 0.43 | learning rate: 2.888E-05 | global batch size: 256 | lm loss: 2.233747E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.328 | TFLOPs: 31.29 | 7: iteration 98950/ 115203 | consumed samples: 25331200 | consumed tokens: 51878297600 | elapsed time per iteration (s): 0.44 | learning rate: 2.887E-05 | global batch size: 256 | lm loss: 2.215505E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.189 | TFLOPs: 30.81 | 7: iteration 98960/ 115203 | consumed samples: 25333760 | consumed tokens: 51883540480 | elapsed time per iteration (s): 0.43 | learning rate: 2.886E-05 | global batch size: 256 | lm loss: 2.196261E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.346 | TFLOPs: 30.97 | 7: iteration 98970/ 115203 | consumed samples: 25336320 | consumed tokens: 51888783360 | elapsed time per iteration (s): 0.42 | learning rate: 2.885E-05 | global batch size: 256 | lm loss: 2.220996E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.481 | TFLOPs: 31.77 | 7: iteration 98980/ 115203 | consumed samples: 25338880 | consumed tokens: 51894026240 | elapsed time per iteration (s): 0.43 | learning rate: 2.884E-05 | global batch size: 256 | lm loss: 2.233470E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.379 | TFLOPs: 31.50 | 7: iteration 98990/ 115203 | consumed samples: 25341440 | consumed tokens: 51899269120 | elapsed time per iteration (s): 0.43 | learning rate: 2.883E-05 | global batch size: 256 | lm loss: 2.226364E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.649 | TFLOPs: 31.04 | 7: iteration 99000/ 115203 | consumed samples: 25344000 | consumed tokens: 51904512000 | elapsed time per iteration (s): 0.43 | learning rate: 2.882E-05 | global batch size: 256 | lm loss: 2.225387E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.565 | TFLOPs: 31.09 | 7: ------------------------------------------------------------------------------------------- 7: valid loss at iteration 99000 | lm loss value: 2.276255E+00 | lm loss PPL: 9.740136E+00 | 7: ------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 99000 to checkpoints_221m 0: [2022-11-29 00:53:47,699] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step99000 is begin to save! 0: [2022-11-29 00:53:47,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_01-model_00-model_states.pt... 0: [2022-11-29 00:53:47,836] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_01-model_00-model_states.pt. 0: [2022-11-29 00:53:47,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_03-model_00-model_states.pt... 0: [2022-11-29 00:53:47,859] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_03-model_00-model_states.pt. 0: [2022-11-29 00:53:47,859] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_04-model_00-model_states.pt... 0: [2022-11-29 00:53:47,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_04-model_00-model_states.pt. 0: [2022-11-29 00:53:47,884] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_05-model_00-model_states.pt... 0: [2022-11-29 00:53:47,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_05-model_00-model_states.pt. 0: [2022-11-29 00:53:47,908] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_06-model_00-model_states.pt... 0: [2022-11-29 00:53:47,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_06-model_00-model_states.pt. 0: [2022-11-29 00:53:47,932] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_07-model_00-model_states.pt... 0: [2022-11-29 00:53:47,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_07-model_00-model_states.pt. 0: [2022-11-29 00:53:47,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_08-model_00-model_states.pt... 0: [2022-11-29 00:53:47,978] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_08-model_00-model_states.pt. 0: [2022-11-29 00:53:47,979] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_09-model_00-model_states.pt... 0: [2022-11-29 00:53:48,002] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_09-model_00-model_states.pt. 0: [2022-11-29 00:53:48,002] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_10-model_00-model_states.pt... 0: [2022-11-29 00:53:48,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_10-model_00-model_states.pt. 0: [2022-11-29 00:53:48,025] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_11-model_00-model_states.pt... 0: [2022-11-29 00:53:48,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_11-model_00-model_states.pt. 0: [2022-11-29 00:53:48,050] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_12-model_00-model_states.pt... 0: [2022-11-29 00:53:48,073] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_12-model_00-model_states.pt. 0: [2022-11-29 00:53:48,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_13-model_00-model_states.pt... 0: [2022-11-29 00:53:48,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_13-model_00-model_states.pt. 0: [2022-11-29 00:53:48,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_14-model_00-model_states.pt... 0: [2022-11-29 00:53:48,120] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_14-model_00-model_states.pt. 0: [2022-11-29 00:53:48,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_15-model_00-model_states.pt... 0: [2022-11-29 00:53:48,143] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_15-model_00-model_states.pt. 0: [2022-11-29 00:53:48,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_16-model_00-model_states.pt... 0: [2022-11-29 00:53:48,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_16-model_00-model_states.pt. 0: [2022-11-29 00:53:48,167] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_17-model_00-model_states.pt... 0: [2022-11-29 00:53:48,189] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_17-model_00-model_states.pt. 0: [2022-11-29 00:53:48,189] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_18-model_00-model_states.pt... 0: [2022-11-29 00:53:48,212] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_18-model_00-model_states.pt. 0: [2022-11-29 00:53:48,213] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_19-model_00-model_states.pt... 0: [2022-11-29 00:53:48,237] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_19-model_00-model_states.pt. 0: [2022-11-29 00:53:48,237] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_20-model_00-model_states.pt... 0: [2022-11-29 00:53:48,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_20-model_00-model_states.pt. 0: [2022-11-29 00:53:48,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/layer_22-model_00-model_states.pt... 0: [2022-11-29 00:53:48,263] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/layer_22-model_00-model_states.pt. 0: [2022-11-29 00:53:48,264] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step99000/mp_rank_00_model_states.pt 0: [2022-11-29 00:53:48,264] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/mp_rank_00_model_states.pt... 0: [2022-11-29 00:53:48,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/mp_rank_00_model_states.pt. 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2022-11-29 00:53:48,285] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 1: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 0: [2022-11-29 00:53:48,284] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 6: [2022-11-29 00:53:48,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,338] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,338] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,338] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,343] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,343] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,344] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,344] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,345] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,345] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,346] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,346] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,348] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,348] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 1: [2022-11-29 00:53:48,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,348] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 1: [2022-11-29 00:53:48,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,349] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2022-11-29 00:53:48,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,349] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,350] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,350] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 3: [2022-11-29 00:53:48,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 00:53:48,351] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 00:53:48,351] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 00:53:48,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,359] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,359] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 00:53:48,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,363] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 6: [2022-11-29 00:53:48,363] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 4: [2022-11-29 00:53:48,396] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 5: [2022-11-29 00:53:48,390] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: [2022-11-29 00:53:48,400] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 00:53:48,400] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 2: [2022-11-29 00:53:48,401] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step99000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 7: [2022-11-29 00:53:48,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step99000 is ready now! 0: successfully saved checkpoint at iteration 99000 to checkpoints_221m 7: time (ms) | save-checkpoint: 845.62 7: iteration 99010/ 115203 | consumed samples: 25346560 | consumed tokens: 51909754880 | elapsed time per iteration (s): 0.54 | learning rate: 2.881E-05 | global batch size: 256 | lm loss: 2.212772E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 470.972 | TFLOPs: 24.71 | 7: iteration 99020/ 115203 | consumed samples: 25349120 | consumed tokens: 51914997760 | elapsed time per iteration (s): 0.43 | learning rate: 2.880E-05 | global batch size: 256 | lm loss: 2.240151E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.562 | TFLOPs: 30.99 | 7: iteration 99030/ 115203 | consumed samples: 25351680 | consumed tokens: 51920240640 | elapsed time per iteration (s): 0.43 | learning rate: 2.878E-05 | global batch size: 256 | lm loss: 2.214543E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.484 | TFLOPs: 30.98 | 7: iteration 99040/ 115203 | consumed samples: 25354240 | consumed tokens: 51925483520 | elapsed time per iteration (s): 0.43 | learning rate: 2.877E-05 | global batch size: 256 | lm loss: 2.237731E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.375 | TFLOPs: 31.45 | 7: iteration 99050/ 115203 | consumed samples: 25356800 | consumed tokens: 51930726400 | elapsed time per iteration (s): 0.43 | learning rate: 2.876E-05 | global batch size: 256 | lm loss: 2.232280E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.720 | TFLOPs: 31.57 | 7: iteration 99060/ 115203 | consumed samples: 25359360 | consumed tokens: 51935969280 | elapsed time per iteration (s): 0.43 | learning rate: 2.875E-05 | global batch size: 256 | lm loss: 2.205891E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.031 | TFLOPs: 31.17 | 7: iteration 99070/ 115203 | consumed samples: 25361920 | consumed tokens: 51941212160 | elapsed time per iteration (s): 0.43 | learning rate: 2.874E-05 | global batch size: 256 | lm loss: 2.209653E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.908 | TFLOPs: 31.11 | 7: iteration 99080/ 115203 | consumed samples: 25364480 | consumed tokens: 51946455040 | elapsed time per iteration (s): 0.44 | learning rate: 2.873E-05 | global batch size: 256 | lm loss: 2.226068E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.538 | TFLOPs: 30.72 | 7: iteration 99090/ 115203 | consumed samples: 25367040 | consumed tokens: 51951697920 | elapsed time per iteration (s): 0.45 | learning rate: 2.872E-05 | global batch size: 256 | lm loss: 2.230005E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.987 | TFLOPs: 30.17 | 7: iteration 99100/ 115203 | consumed samples: 25369600 | consumed tokens: 51956940800 | elapsed time per iteration (s): 0.45 | learning rate: 2.871E-05 | global batch size: 256 | lm loss: 2.213613E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.190 | TFLOPs: 30.18 | 7: iteration 99110/ 115203 | consumed samples: 25372160 | consumed tokens: 51962183680 | elapsed time per iteration (s): 0.43 | learning rate: 2.870E-05 | global batch size: 256 | lm loss: 2.211706E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.431 | TFLOPs: 31.35 | 7: iteration 99120/ 115203 | consumed samples: 25374720 | consumed tokens: 51967426560 | elapsed time per iteration (s): 0.42 | learning rate: 2.869E-05 | global batch size: 256 | lm loss: 2.192511E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.596 | TFLOPs: 31.72 | 7: iteration 99130/ 115203 | consumed samples: 25377280 | consumed tokens: 51972669440 | elapsed time per iteration (s): 0.43 | learning rate: 2.868E-05 | global batch size: 256 | lm loss: 2.212143E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.683 | TFLOPs: 31.15 | 7: iteration 99140/ 115203 | consumed samples: 25379840 | consumed tokens: 51977912320 | elapsed time per iteration (s): 0.42 | learning rate: 2.867E-05 | global batch size: 256 | lm loss: 2.236710E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.077 | TFLOPs: 31.90 | 7: iteration 99150/ 115203 | consumed samples: 25382400 | consumed tokens: 51983155200 | elapsed time per iteration (s): 0.44 | learning rate: 2.866E-05 | global batch size: 256 | lm loss: 2.227788E+00 | grad norm: 0.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.917 | TFLOPs: 30.64 | 7: iteration 99160/ 115203 | consumed samples: 25384960 | consumed tokens: 51988398080 | elapsed time per iteration (s): 0.44 | learning rate: 2.865E-05 | global batch size: 256 | lm loss: 2.234709E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.434 | TFLOPs: 30.66 | 7: iteration 99170/ 115203 | consumed samples: 25387520 | consumed tokens: 51993640960 | elapsed time per iteration (s): 0.44 | learning rate: 2.864E-05 | global batch size: 256 | lm loss: 2.222105E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.607 | TFLOPs: 30.78 | 7: iteration 99180/ 115203 | consumed samples: 25390080 | consumed tokens: 51998883840 | elapsed time per iteration (s): 0.44 | learning rate: 2.863E-05 | global batch size: 256 | lm loss: 2.210068E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.830 | TFLOPs: 30.42 | 7: iteration 99190/ 115203 | consumed samples: 25392640 | consumed tokens: 52004126720 | elapsed time per iteration (s): 0.43 | learning rate: 2.861E-05 | global batch size: 256 | lm loss: 2.223976E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.786 | TFLOPs: 31.36 | 7: iteration 99200/ 115203 | consumed samples: 25395200 | consumed tokens: 52009369600 | elapsed time per iteration (s): 0.43 | learning rate: 2.860E-05 | global batch size: 256 | lm loss: 2.200180E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.012 | TFLOPs: 31.48 | 7: iteration 99210/ 115203 | consumed samples: 25397760 | consumed tokens: 52014612480 | elapsed time per iteration (s): 0.43 | learning rate: 2.859E-05 | global batch size: 256 | lm loss: 2.221704E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.904 | TFLOPs: 31.11 | 7: iteration 99220/ 115203 | consumed samples: 25400320 | consumed tokens: 52019855360 | elapsed time per iteration (s): 0.45 | learning rate: 2.858E-05 | global batch size: 256 | lm loss: 2.227974E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.098 | TFLOPs: 30.17 | 7: iteration 99230/ 115203 | consumed samples: 25402880 | consumed tokens: 52025098240 | elapsed time per iteration (s): 0.46 | learning rate: 2.857E-05 | global batch size: 256 | lm loss: 2.245325E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.683 | TFLOPs: 29.47 | 7: iteration 99240/ 115203 | consumed samples: 25405440 | consumed tokens: 52030341120 | elapsed time per iteration (s): 0.44 | learning rate: 2.856E-05 | global batch size: 256 | lm loss: 2.234762E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.697 | TFLOPs: 30.21 | 7: iteration 99250/ 115203 | consumed samples: 25408000 | consumed tokens: 52035584000 | elapsed time per iteration (s): 0.43 | learning rate: 2.855E-05 | global batch size: 256 | lm loss: 2.212703E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.140 | TFLOPs: 31.49 | 7: iteration 99260/ 115203 | consumed samples: 25410560 | consumed tokens: 52040826880 | elapsed time per iteration (s): 0.43 | learning rate: 2.854E-05 | global batch size: 256 | lm loss: 2.207441E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.427 | TFLOPs: 31.14 | 7: iteration 99270/ 115203 | consumed samples: 25413120 | consumed tokens: 52046069760 | elapsed time per iteration (s): 0.43 | learning rate: 2.853E-05 | global batch size: 256 | lm loss: 2.235435E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.972 | TFLOPs: 30.90 | 7: iteration 99280/ 115203 | consumed samples: 25415680 | consumed tokens: 52051312640 | elapsed time per iteration (s): 0.43 | learning rate: 2.852E-05 | global batch size: 256 | lm loss: 2.252274E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.524 | TFLOPs: 31.51 | 7: iteration 99290/ 115203 | consumed samples: 25418240 | consumed tokens: 52056555520 | elapsed time per iteration (s): 0.43 | learning rate: 2.851E-05 | global batch size: 256 | lm loss: 2.202864E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.042 | TFLOPs: 31.12 | 7: iteration 99300/ 115203 | consumed samples: 25420800 | consumed tokens: 52061798400 | elapsed time per iteration (s): 0.43 | learning rate: 2.850E-05 | global batch size: 256 | lm loss: 2.263009E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.794 | TFLOPs: 31.05 | 7: iteration 99310/ 115203 | consumed samples: 25423360 | consumed tokens: 52067041280 | elapsed time per iteration (s): 0.42 | learning rate: 2.849E-05 | global batch size: 256 | lm loss: 2.218629E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.805 | TFLOPs: 31.68 | 7: iteration 99320/ 115203 | consumed samples: 25425920 | consumed tokens: 52072284160 | elapsed time per iteration (s): 0.45 | learning rate: 2.848E-05 | global batch size: 256 | lm loss: 2.229609E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.233 | TFLOPs: 30.02 | 7: iteration 99330/ 115203 | consumed samples: 25428480 | consumed tokens: 52077527040 | elapsed time per iteration (s): 0.44 | learning rate: 2.847E-05 | global batch size: 256 | lm loss: 2.233066E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.150 | TFLOPs: 30.70 | 7: iteration 99340/ 115203 | consumed samples: 25431040 | consumed tokens: 52082769920 | elapsed time per iteration (s): 0.43 | learning rate: 2.846E-05 | global batch size: 256 | lm loss: 2.225716E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.424 | TFLOPs: 30.98 | 7: iteration 99350/ 115203 | consumed samples: 25433600 | consumed tokens: 52088012800 | elapsed time per iteration (s): 0.43 | learning rate: 2.845E-05 | global batch size: 256 | lm loss: 2.203037E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.463 | TFLOPs: 31.45 | 7: iteration 99360/ 115203 | consumed samples: 25436160 | consumed tokens: 52093255680 | elapsed time per iteration (s): 0.44 | learning rate: 2.844E-05 | global batch size: 256 | lm loss: 2.235312E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.438 | TFLOPs: 30.51 | 7: iteration 99370/ 115203 | consumed samples: 25438720 | consumed tokens: 52098498560 | elapsed time per iteration (s): 0.44 | learning rate: 2.843E-05 | global batch size: 256 | lm loss: 2.211204E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.733 | TFLOPs: 30.37 | 7: iteration 99380/ 115203 | consumed samples: 25441280 | consumed tokens: 52103741440 | elapsed time per iteration (s): 0.43 | learning rate: 2.841E-05 | global batch size: 256 | lm loss: 2.230970E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.987 | TFLOPs: 31.01 | 7: iteration 99390/ 115203 | consumed samples: 25443840 | consumed tokens: 52108984320 | elapsed time per iteration (s): 0.43 | learning rate: 2.840E-05 | global batch size: 256 | lm loss: 2.246874E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.039 | TFLOPs: 31.22 | 7: iteration 99400/ 115203 | consumed samples: 25446400 | consumed tokens: 52114227200 | elapsed time per iteration (s): 0.44 | learning rate: 2.839E-05 | global batch size: 256 | lm loss: 2.242672E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.075 | TFLOPs: 30.70 | 7: iteration 99410/ 115203 | consumed samples: 25448960 | consumed tokens: 52119470080 | elapsed time per iteration (s): 0.43 | learning rate: 2.838E-05 | global batch size: 256 | lm loss: 2.233073E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.048 | TFLOPs: 30.96 | 7: iteration 99420/ 115203 | consumed samples: 25451520 | consumed tokens: 52124712960 | elapsed time per iteration (s): 0.43 | learning rate: 2.837E-05 | global batch size: 256 | lm loss: 2.224667E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.517 | TFLOPs: 31.25 | 7: iteration 99430/ 115203 | consumed samples: 25454080 | consumed tokens: 52129955840 | elapsed time per iteration (s): 0.43 | learning rate: 2.836E-05 | global batch size: 256 | lm loss: 2.222337E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.046 | TFLOPs: 31.01 | 7: iteration 99440/ 115203 | consumed samples: 25456640 | consumed tokens: 52135198720 | elapsed time per iteration (s): 0.42 | learning rate: 2.835E-05 | global batch size: 256 | lm loss: 2.239424E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.376 | TFLOPs: 32.08 | 7: iteration 99450/ 115203 | consumed samples: 25459200 | consumed tokens: 52140441600 | elapsed time per iteration (s): 0.44 | learning rate: 2.834E-05 | global batch size: 256 | lm loss: 2.233351E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.026 | TFLOPs: 30.28 | 7: iteration 99460/ 115203 | consumed samples: 25461760 | consumed tokens: 52145684480 | elapsed time per iteration (s): 0.44 | learning rate: 2.833E-05 | global batch size: 256 | lm loss: 2.189427E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.364 | TFLOPs: 30.82 | 7: iteration 99470/ 115203 | consumed samples: 25464320 | consumed tokens: 52150927360 | elapsed time per iteration (s): 0.44 | learning rate: 2.832E-05 | global batch size: 256 | lm loss: 2.239287E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.688 | TFLOPs: 30.68 | 7: iteration 99480/ 115203 | consumed samples: 25466880 | consumed tokens: 52156170240 | elapsed time per iteration (s): 0.45 | learning rate: 2.831E-05 | global batch size: 256 | lm loss: 2.229909E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.719 | TFLOPs: 29.79 | 7: iteration 99490/ 115203 | consumed samples: 25469440 | consumed tokens: 52161413120 | elapsed time per iteration (s): 0.44 | learning rate: 2.830E-05 | global batch size: 256 | lm loss: 2.214809E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.848 | TFLOPs: 30.79 | 7: iteration 99500/ 115203 | consumed samples: 25472000 | consumed tokens: 52166656000 | elapsed time per iteration (s): 0.44 | learning rate: 2.829E-05 | global batch size: 256 | lm loss: 2.247962E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.650 | TFLOPs: 30.78 | 7: iteration 99510/ 115203 | consumed samples: 25474560 | consumed tokens: 52171898880 | elapsed time per iteration (s): 0.43 | learning rate: 2.828E-05 | global batch size: 256 | lm loss: 2.243312E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.069 | TFLOPs: 30.91 | 7: iteration 99520/ 115203 | consumed samples: 25477120 | consumed tokens: 52177141760 | elapsed time per iteration (s): 0.43 | learning rate: 2.827E-05 | global batch size: 256 | lm loss: 2.231294E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.485 | TFLOPs: 31.24 | 7: iteration 99530/ 115203 | consumed samples: 25479680 | consumed tokens: 52182384640 | elapsed time per iteration (s): 0.46 | learning rate: 2.826E-05 | global batch size: 256 | lm loss: 2.211293E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 562.201 | TFLOPs: 29.50 | 7: iteration 99540/ 115203 | consumed samples: 25482240 | consumed tokens: 52187627520 | elapsed time per iteration (s): 0.43 | learning rate: 2.825E-05 | global batch size: 256 | lm loss: 2.221747E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.291 | TFLOPs: 31.08 | 7: iteration 99550/ 115203 | consumed samples: 25484800 | consumed tokens: 52192870400 | elapsed time per iteration (s): 0.44 | learning rate: 2.824E-05 | global batch size: 256 | lm loss: 2.216168E+00 | grad norm: 0.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.993 | TFLOPs: 30.80 | 7: iteration 99560/ 115203 | consumed samples: 25487360 | consumed tokens: 52198113280 | elapsed time per iteration (s): 0.43 | learning rate: 2.823E-05 | global batch size: 256 | lm loss: 2.200208E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.343 | TFLOPs: 31.45 | 7: iteration 99570/ 115203 | consumed samples: 25489920 | consumed tokens: 52203356160 | elapsed time per iteration (s): 0.42 | learning rate: 2.822E-05 | global batch size: 256 | lm loss: 2.248212E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.855 | TFLOPs: 31.63 | 7: iteration 99580/ 115203 | consumed samples: 25492480 | consumed tokens: 52208599040 | elapsed time per iteration (s): 0.44 | learning rate: 2.821E-05 | global batch size: 256 | lm loss: 2.188608E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.642 | TFLOPs: 30.26 | 7: iteration 99590/ 115203 | consumed samples: 25495040 | consumed tokens: 52213841920 | elapsed time per iteration (s): 0.43 | learning rate: 2.820E-05 | global batch size: 256 | lm loss: 2.199845E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.412 | TFLOPs: 31.24 | 7: iteration 99600/ 115203 | consumed samples: 25497600 | consumed tokens: 52219084800 | elapsed time per iteration (s): 0.43 | learning rate: 2.819E-05 | global batch size: 256 | lm loss: 2.212770E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.746 | TFLOPs: 31.00 | 7: iteration 99610/ 115203 | consumed samples: 25500160 | consumed tokens: 52224327680 | elapsed time per iteration (s): 0.42 | learning rate: 2.818E-05 | global batch size: 256 | lm loss: 2.216835E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.711 | TFLOPs: 31.62 | 7: iteration 99620/ 115203 | consumed samples: 25502720 | consumed tokens: 52229570560 | elapsed time per iteration (s): 0.42 | learning rate: 2.817E-05 | global batch size: 256 | lm loss: 2.201797E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.368 | TFLOPs: 31.76 | 7: iteration 99630/ 115203 | consumed samples: 25505280 | consumed tokens: 52234813440 | elapsed time per iteration (s): 0.44 | learning rate: 2.816E-05 | global batch size: 256 | lm loss: 2.233309E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.426 | TFLOPs: 30.24 | 7: iteration 99640/ 115203 | consumed samples: 25507840 | consumed tokens: 52240056320 | elapsed time per iteration (s): 0.45 | learning rate: 2.814E-05 | global batch size: 256 | lm loss: 2.246569E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.208 | TFLOPs: 30.13 | 7: iteration 99650/ 115203 | consumed samples: 25510400 | consumed tokens: 52245299200 | elapsed time per iteration (s): 0.43 | learning rate: 2.813E-05 | global batch size: 256 | lm loss: 2.227122E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.830 | TFLOPs: 30.95 | 7: iteration 99660/ 115203 | consumed samples: 25512960 | consumed tokens: 52250542080 | elapsed time per iteration (s): 0.44 | learning rate: 2.812E-05 | global batch size: 256 | lm loss: 2.225208E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.371 | TFLOPs: 30.35 | 7: iteration 99670/ 115203 | consumed samples: 25515520 | consumed tokens: 52255784960 | elapsed time per iteration (s): 0.45 | learning rate: 2.811E-05 | global batch size: 256 | lm loss: 2.211351E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.984 | TFLOPs: 30.06 | 7: iteration 99680/ 115203 | consumed samples: 25518080 | consumed tokens: 52261027840 | elapsed time per iteration (s): 0.43 | learning rate: 2.810E-05 | global batch size: 256 | lm loss: 2.248255E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.463 | TFLOPs: 31.56 | 7: iteration 99690/ 115203 | consumed samples: 25520640 | consumed tokens: 52266270720 | elapsed time per iteration (s): 0.44 | learning rate: 2.809E-05 | global batch size: 256 | lm loss: 2.248414E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.338 | TFLOPs: 30.87 | 7: iteration 99700/ 115203 | consumed samples: 25523200 | consumed tokens: 52271513600 | elapsed time per iteration (s): 0.46 | learning rate: 2.808E-05 | global batch size: 256 | lm loss: 2.228635E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.424 | TFLOPs: 29.14 | 7: iteration 99710/ 115203 | consumed samples: 25525760 | consumed tokens: 52276756480 | elapsed time per iteration (s): 0.43 | learning rate: 2.807E-05 | global batch size: 256 | lm loss: 2.233982E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.086 | TFLOPs: 31.01 | 7: iteration 99720/ 115203 | consumed samples: 25528320 | consumed tokens: 52281999360 | elapsed time per iteration (s): 0.44 | learning rate: 2.806E-05 | global batch size: 256 | lm loss: 2.251967E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.031 | TFLOPs: 30.59 | 7: iteration 99730/ 115203 | consumed samples: 25530880 | consumed tokens: 52287242240 | elapsed time per iteration (s): 0.43 | learning rate: 2.805E-05 | global batch size: 256 | lm loss: 2.225902E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.620 | TFLOPs: 30.88 | 7: iteration 99740/ 115203 | consumed samples: 25533440 | consumed tokens: 52292485120 | elapsed time per iteration (s): 0.45 | learning rate: 2.804E-05 | global batch size: 256 | lm loss: 2.222928E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.703 | TFLOPs: 30.15 | 7: iteration 99750/ 115203 | consumed samples: 25536000 | consumed tokens: 52297728000 | elapsed time per iteration (s): 0.46 | learning rate: 2.803E-05 | global batch size: 256 | lm loss: 2.230800E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.831 | TFLOPs: 29.43 | 7: iteration 99760/ 115203 | consumed samples: 25538560 | consumed tokens: 52302970880 | elapsed time per iteration (s): 0.44 | learning rate: 2.802E-05 | global batch size: 256 | lm loss: 2.204694E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.553 | TFLOPs: 30.46 | 7: iteration 99770/ 115203 | consumed samples: 25541120 | consumed tokens: 52308213760 | elapsed time per iteration (s): 0.46 | learning rate: 2.801E-05 | global batch size: 256 | lm loss: 2.234402E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.197 | TFLOPs: 29.24 | 7: iteration 99780/ 115203 | consumed samples: 25543680 | consumed tokens: 52313456640 | elapsed time per iteration (s): 0.45 | learning rate: 2.800E-05 | global batch size: 256 | lm loss: 2.217061E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.095 | TFLOPs: 30.12 | 7: iteration 99790/ 115203 | consumed samples: 25546240 | consumed tokens: 52318699520 | elapsed time per iteration (s): 0.45 | learning rate: 2.799E-05 | global batch size: 256 | lm loss: 2.216561E+00 | grad norm: 0.236 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.312 | TFLOPs: 30.08 | 7: iteration 99800/ 115203 | consumed samples: 25548800 | consumed tokens: 52323942400 | elapsed time per iteration (s): 0.45 | learning rate: 2.798E-05 | global batch size: 256 | lm loss: 2.225713E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.116 | TFLOPs: 30.02 | 7: iteration 99810/ 115203 | consumed samples: 25551360 | consumed tokens: 52329185280 | elapsed time per iteration (s): 0.44 | learning rate: 2.797E-05 | global batch size: 256 | lm loss: 2.250447E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.891 | TFLOPs: 30.74 | 7: iteration 99820/ 115203 | consumed samples: 25553920 | consumed tokens: 52334428160 | elapsed time per iteration (s): 0.44 | learning rate: 2.796E-05 | global batch size: 256 | lm loss: 2.213869E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.793 | TFLOPs: 30.32 | 7: iteration 99830/ 115203 | consumed samples: 25556480 | consumed tokens: 52339671040 | elapsed time per iteration (s): 0.44 | learning rate: 2.795E-05 | global batch size: 256 | lm loss: 2.227088E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.815 | TFLOPs: 30.37 | 7: iteration 99840/ 115203 | consumed samples: 25559040 | consumed tokens: 52344913920 | elapsed time per iteration (s): 0.44 | learning rate: 2.794E-05 | global batch size: 256 | lm loss: 2.206201E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.097 | TFLOPs: 30.33 | 7: iteration 99850/ 115203 | consumed samples: 25561600 | consumed tokens: 52350156800 | elapsed time per iteration (s): 0.43 | learning rate: 2.793E-05 | global batch size: 256 | lm loss: 2.229358E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.423 | TFLOPs: 31.03 | 7: iteration 99860/ 115203 | consumed samples: 25564160 | consumed tokens: 52355399680 | elapsed time per iteration (s): 0.44 | learning rate: 2.792E-05 | global batch size: 256 | lm loss: 2.236297E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.993 | TFLOPs: 30.80 | 7: iteration 99870/ 115203 | consumed samples: 25566720 | consumed tokens: 52360642560 | elapsed time per iteration (s): 0.44 | learning rate: 2.791E-05 | global batch size: 256 | lm loss: 2.238197E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.430 | TFLOPs: 30.56 | 7: iteration 99880/ 115203 | consumed samples: 25569280 | consumed tokens: 52365885440 | elapsed time per iteration (s): 0.42 | learning rate: 2.790E-05 | global batch size: 256 | lm loss: 2.251291E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.531 | TFLOPs: 31.72 | 7: iteration 99890/ 115203 | consumed samples: 25571840 | consumed tokens: 52371128320 | elapsed time per iteration (s): 0.43 | learning rate: 2.789E-05 | global batch size: 256 | lm loss: 2.177281E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.824 | TFLOPs: 31.52 | 7: iteration 99900/ 115203 | consumed samples: 25574400 | consumed tokens: 52376371200 | elapsed time per iteration (s): 0.44 | learning rate: 2.788E-05 | global batch size: 256 | lm loss: 2.242241E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.814 | TFLOPs: 30.84 | 7: iteration 99910/ 115203 | consumed samples: 25576960 | consumed tokens: 52381614080 | elapsed time per iteration (s): 0.43 | learning rate: 2.787E-05 | global batch size: 256 | lm loss: 2.216372E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.415 | TFLOPs: 30.93 | 7: iteration 99920/ 115203 | consumed samples: 25579520 | consumed tokens: 52386856960 | elapsed time per iteration (s): 0.44 | learning rate: 2.786E-05 | global batch size: 256 | lm loss: 2.225448E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.121 | TFLOPs: 30.23 | 7: iteration 99930/ 115203 | consumed samples: 25582080 | consumed tokens: 52392099840 | elapsed time per iteration (s): 0.43 | learning rate: 2.785E-05 | global batch size: 256 | lm loss: 2.223406E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.395 | TFLOPs: 30.98 | 7: iteration 99940/ 115203 | consumed samples: 25584640 | consumed tokens: 52397342720 | elapsed time per iteration (s): 0.43 | learning rate: 2.784E-05 | global batch size: 256 | lm loss: 2.234403E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.335 | TFLOPs: 31.60 | 7: iteration 99950/ 115203 | consumed samples: 25587200 | consumed tokens: 52402585600 | elapsed time per iteration (s): 0.43 | learning rate: 2.783E-05 | global batch size: 256 | lm loss: 2.219049E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.406 | TFLOPs: 31.14 | 7: iteration 99960/ 115203 | consumed samples: 25589760 | consumed tokens: 52407828480 | elapsed time per iteration (s): 0.43 | learning rate: 2.782E-05 | global batch size: 256 | lm loss: 2.234344E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.942 | TFLOPs: 30.90 | 7: iteration 99970/ 115203 | consumed samples: 25592320 | consumed tokens: 52413071360 | elapsed time per iteration (s): 0.42 | learning rate: 2.781E-05 | global batch size: 256 | lm loss: 2.223385E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.018 | TFLOPs: 31.80 | 7: iteration 99980/ 115203 | consumed samples: 25594880 | consumed tokens: 52418314240 | elapsed time per iteration (s): 0.44 | learning rate: 2.780E-05 | global batch size: 256 | lm loss: 2.197982E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.834 | TFLOPs: 30.58 | 7: iteration 99990/ 115203 | consumed samples: 25597440 | consumed tokens: 52423557120 | elapsed time per iteration (s): 0.44 | learning rate: 2.779E-05 | global batch size: 256 | lm loss: 2.205242E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.775 | TFLOPs: 30.84 | 0: [2022-11-29 01:01:04,684] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=0, lr=[2.777783369036059e-05, 2.777783369036059e-05, 2.777783369036059e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 100000/ 115203 | consumed samples: 25600000 | consumed tokens: 52428800000 | elapsed time per iteration (s): 0.44 | learning rate: 2.778E-05 | global batch size: 256 | lm loss: 2.209361E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.956 | TFLOPs: 30.43 | 0: steps: 100000 loss: 2.2780 iter time (s): 0.440 samples/sec: 581.901 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 100000 | lm loss value: 2.202363E+00 | lm loss PPL: 9.046363E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 100000 to checkpoints_221m 0: [2022-11-29 01:01:04,852] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step100000 is begin to save! 0: [2022-11-29 01:01:04,858] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:01:04,971] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:01:04,971] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:01:04,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:01:04,993] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:01:05,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:01:05,016] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:01:05,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:01:05,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:01:05,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:01:05,079] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:01:05,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:01:05,106] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:01:05,129] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:01:05,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:01:05,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:01:05,155] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:01:05,178] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:01:05,178] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:01:05,202] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:01:05,203] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:01:05,228] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:01:05,228] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:01:05,252] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:01:05,252] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:01:05,276] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:01:05,277] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:01:05,303] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:01:05,303] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:01:05,328] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:01:05,328] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:01:05,351] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:01:05,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:01:05,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:01:05,375] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:01:05,404] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:01:05,404] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:01:05,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:01:05,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:01:05,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:01:05,434] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step100000/mp_rank_00_model_states.pt 0: [2022-11-29 01:01:05,434] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:01:05,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:01:05,455] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:01:05,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2022-11-29 01:01:05,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,507] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,507] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,507] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2022-11-29 01:01:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2022-11-29 01:01:05,510] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,510] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,510] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,511] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,511] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,512] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,512] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,513] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,513] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:01:05,513] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,514] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,514] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 4: [2022-11-29 01:01:05,515] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:01:05,515] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2022-11-29 01:01:05,516] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,517] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,516] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,517] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 2: [2022-11-29 01:01:05,518] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:01:05,518] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:01:05,518] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,520] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,520] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,522] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 5: [2022-11-29 01:01:05,522] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 1: [2022-11-29 01:01:05,523] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:01:05,523] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:01:05,523] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,524] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,524] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,524] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:01:05,525] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,525] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,533] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 6: [2022-11-29 01:01:05,534] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:01:05,534] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:01:05,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 7: [2022-11-29 01:01:05,550] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:01:05,550] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:01:05,551] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 3: [2022-11-29 01:01:05,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: [2022-11-29 01:01:05,616] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step100000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:01:05,616] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step100000 is ready now! 0: successfully saved checkpoint at iteration 100000 to checkpoints_221m 7: time (ms) | save-checkpoint: 927.14 7: iteration 100010/ 115203 | consumed samples: 25602560 | consumed tokens: 52434042880 | elapsed time per iteration (s): 0.68 | learning rate: 2.777E-05 | global batch size: 256 | lm loss: 2.267044E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 375.111 | TFLOPs: 19.68 | 7: iteration 100020/ 115203 | consumed samples: 25605120 | consumed tokens: 52439285760 | elapsed time per iteration (s): 0.42 | learning rate: 2.776E-05 | global batch size: 256 | lm loss: 2.256006E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.646 | TFLOPs: 31.67 | 7: iteration 100030/ 115203 | consumed samples: 25607680 | consumed tokens: 52444528640 | elapsed time per iteration (s): 0.44 | learning rate: 2.775E-05 | global batch size: 256 | lm loss: 2.237874E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.775 | TFLOPs: 30.52 | 7: iteration 100040/ 115203 | consumed samples: 25610240 | consumed tokens: 52449771520 | elapsed time per iteration (s): 0.44 | learning rate: 2.774E-05 | global batch size: 256 | lm loss: 2.260791E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.158 | TFLOPs: 30.39 | 7: iteration 100050/ 115203 | consumed samples: 25612800 | consumed tokens: 52455014400 | elapsed time per iteration (s): 0.43 | learning rate: 2.773E-05 | global batch size: 256 | lm loss: 2.221607E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.249 | TFLOPs: 31.13 | 7: iteration 100060/ 115203 | consumed samples: 25615360 | consumed tokens: 52460257280 | elapsed time per iteration (s): 0.43 | learning rate: 2.772E-05 | global batch size: 256 | lm loss: 2.204686E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.516 | TFLOPs: 31.14 | 7: iteration 100070/ 115203 | consumed samples: 25617920 | consumed tokens: 52465500160 | elapsed time per iteration (s): 0.43 | learning rate: 2.771E-05 | global batch size: 256 | lm loss: 2.222404E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.440 | TFLOPs: 30.98 | 7: iteration 100080/ 115203 | consumed samples: 25620480 | consumed tokens: 52470743040 | elapsed time per iteration (s): 0.44 | learning rate: 2.770E-05 | global batch size: 256 | lm loss: 2.246264E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.313 | TFLOPs: 30.29 | 7: iteration 100090/ 115203 | consumed samples: 25623040 | consumed tokens: 52475985920 | elapsed time per iteration (s): 0.44 | learning rate: 2.769E-05 | global batch size: 256 | lm loss: 2.227713E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.615 | TFLOPs: 30.73 | 7: iteration 100100/ 115203 | consumed samples: 25625600 | consumed tokens: 52481228800 | elapsed time per iteration (s): 0.43 | learning rate: 2.768E-05 | global batch size: 256 | lm loss: 2.226917E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.744 | TFLOPs: 31.47 | 7: iteration 100110/ 115203 | consumed samples: 25628160 | consumed tokens: 52486471680 | elapsed time per iteration (s): 0.43 | learning rate: 2.767E-05 | global batch size: 256 | lm loss: 2.238746E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.664 | TFLOPs: 31.46 | 7: iteration 100120/ 115203 | consumed samples: 25630720 | consumed tokens: 52491714560 | elapsed time per iteration (s): 0.44 | learning rate: 2.766E-05 | global batch size: 256 | lm loss: 2.264250E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.782 | TFLOPs: 30.42 | 7: iteration 100130/ 115203 | consumed samples: 25633280 | consumed tokens: 52496957440 | elapsed time per iteration (s): 0.44 | learning rate: 2.765E-05 | global batch size: 256 | lm loss: 2.214454E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.826 | TFLOPs: 30.27 | 7: iteration 100140/ 115203 | consumed samples: 25635840 | consumed tokens: 52502200320 | elapsed time per iteration (s): 0.43 | learning rate: 2.764E-05 | global batch size: 256 | lm loss: 2.203056E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.682 | TFLOPs: 30.94 | 7: iteration 100150/ 115203 | consumed samples: 25638400 | consumed tokens: 52507443200 | elapsed time per iteration (s): 0.43 | learning rate: 2.763E-05 | global batch size: 256 | lm loss: 2.237661E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.951 | TFLOPs: 30.95 | 7: iteration 100160/ 115203 | consumed samples: 25640960 | consumed tokens: 52512686080 | elapsed time per iteration (s): 0.43 | learning rate: 2.762E-05 | global batch size: 256 | lm loss: 2.235811E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.348 | TFLOPs: 31.24 | 7: iteration 100170/ 115203 | consumed samples: 25643520 | consumed tokens: 52517928960 | elapsed time per iteration (s): 0.44 | learning rate: 2.761E-05 | global batch size: 256 | lm loss: 2.225533E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.488 | TFLOPs: 30.46 | 7: iteration 100180/ 115203 | consumed samples: 25646080 | consumed tokens: 52523171840 | elapsed time per iteration (s): 0.43 | learning rate: 2.760E-05 | global batch size: 256 | lm loss: 2.225660E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.358 | TFLOPs: 31.08 | 7: iteration 100190/ 115203 | consumed samples: 25648640 | consumed tokens: 52528414720 | elapsed time per iteration (s): 0.43 | learning rate: 2.759E-05 | global batch size: 256 | lm loss: 2.216336E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.489 | TFLOPs: 31.24 | 7: iteration 100200/ 115203 | consumed samples: 25651200 | consumed tokens: 52533657600 | elapsed time per iteration (s): 0.44 | learning rate: 2.758E-05 | global batch size: 256 | lm loss: 2.254051E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.591 | TFLOPs: 30.67 | 7: iteration 100210/ 115203 | consumed samples: 25653760 | consumed tokens: 52538900480 | elapsed time per iteration (s): 0.43 | learning rate: 2.757E-05 | global batch size: 256 | lm loss: 2.221931E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.874 | TFLOPs: 31.00 | 7: iteration 100220/ 115203 | consumed samples: 25656320 | consumed tokens: 52544143360 | elapsed time per iteration (s): 0.42 | learning rate: 2.756E-05 | global batch size: 256 | lm loss: 2.205110E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.983 | TFLOPs: 31.95 | 7: iteration 100230/ 115203 | consumed samples: 25658880 | consumed tokens: 52549386240 | elapsed time per iteration (s): 0.44 | learning rate: 2.755E-05 | global batch size: 256 | lm loss: 2.197948E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.920 | TFLOPs: 30.58 | 7: iteration 100240/ 115203 | consumed samples: 25661440 | consumed tokens: 52554629120 | elapsed time per iteration (s): 0.44 | learning rate: 2.754E-05 | global batch size: 256 | lm loss: 2.205227E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.900 | TFLOPs: 30.32 | 7: iteration 100250/ 115203 | consumed samples: 25664000 | consumed tokens: 52559872000 | elapsed time per iteration (s): 0.43 | learning rate: 2.753E-05 | global batch size: 256 | lm loss: 2.232066E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.119 | TFLOPs: 31.22 | 7: iteration 100260/ 115203 | consumed samples: 25666560 | consumed tokens: 52565114880 | elapsed time per iteration (s): 0.44 | learning rate: 2.752E-05 | global batch size: 256 | lm loss: 2.212379E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.320 | TFLOPs: 30.61 | 7: iteration 100270/ 115203 | consumed samples: 25669120 | consumed tokens: 52570357760 | elapsed time per iteration (s): 0.43 | learning rate: 2.751E-05 | global batch size: 256 | lm loss: 2.234033E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.353 | TFLOPs: 30.97 | 7: iteration 100280/ 115203 | consumed samples: 25671680 | consumed tokens: 52575600640 | elapsed time per iteration (s): 0.43 | learning rate: 2.750E-05 | global batch size: 256 | lm loss: 2.208582E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.518 | TFLOPs: 31.51 | 7: iteration 100290/ 115203 | consumed samples: 25674240 | consumed tokens: 52580843520 | elapsed time per iteration (s): 0.43 | learning rate: 2.749E-05 | global batch size: 256 | lm loss: 2.221444E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.433 | TFLOPs: 30.98 | 7: iteration 100300/ 115203 | consumed samples: 25676800 | consumed tokens: 52586086400 | elapsed time per iteration (s): 0.44 | learning rate: 2.748E-05 | global batch size: 256 | lm loss: 2.250127E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.378 | TFLOPs: 30.61 | 7: iteration 100310/ 115203 | consumed samples: 25679360 | consumed tokens: 52591329280 | elapsed time per iteration (s): 0.43 | learning rate: 2.747E-05 | global batch size: 256 | lm loss: 2.210802E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.645 | TFLOPs: 30.89 | 7: iteration 100320/ 115203 | consumed samples: 25681920 | consumed tokens: 52596572160 | elapsed time per iteration (s): 0.43 | learning rate: 2.746E-05 | global batch size: 256 | lm loss: 2.218874E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.345 | TFLOPs: 31.08 | 7: iteration 100330/ 115203 | consumed samples: 25684480 | consumed tokens: 52601815040 | elapsed time per iteration (s): 0.43 | learning rate: 2.745E-05 | global batch size: 256 | lm loss: 2.231192E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.671 | TFLOPs: 30.94 | 7: iteration 100340/ 115203 | consumed samples: 25687040 | consumed tokens: 52607057920 | elapsed time per iteration (s): 0.43 | learning rate: 2.744E-05 | global batch size: 256 | lm loss: 2.256789E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.445 | TFLOPs: 31.14 | 7: iteration 100350/ 115203 | consumed samples: 25689600 | consumed tokens: 52612300800 | elapsed time per iteration (s): 0.44 | learning rate: 2.743E-05 | global batch size: 256 | lm loss: 2.219527E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.801 | TFLOPs: 30.79 | 7: iteration 100360/ 115203 | consumed samples: 25692160 | consumed tokens: 52617543680 | elapsed time per iteration (s): 0.43 | learning rate: 2.742E-05 | global batch size: 256 | lm loss: 2.241161E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.244 | TFLOPs: 31.13 | 7: iteration 100370/ 115203 | consumed samples: 25694720 | consumed tokens: 52622786560 | elapsed time per iteration (s): 0.43 | learning rate: 2.741E-05 | global batch size: 256 | lm loss: 2.228553E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.485 | TFLOPs: 31.35 | 7: iteration 100380/ 115203 | consumed samples: 25697280 | consumed tokens: 52628029440 | elapsed time per iteration (s): 0.43 | learning rate: 2.740E-05 | global batch size: 256 | lm loss: 2.246155E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.265 | TFLOPs: 31.13 | 7: iteration 100390/ 115203 | consumed samples: 25699840 | consumed tokens: 52633272320 | elapsed time per iteration (s): 0.43 | learning rate: 2.739E-05 | global batch size: 256 | lm loss: 2.239582E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.428 | TFLOPs: 30.98 | 7: iteration 100400/ 115203 | consumed samples: 25702400 | consumed tokens: 52638515200 | elapsed time per iteration (s): 0.43 | learning rate: 2.738E-05 | global batch size: 256 | lm loss: 2.230865E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.534 | TFLOPs: 31.04 | 7: iteration 100410/ 115203 | consumed samples: 25704960 | consumed tokens: 52643758080 | elapsed time per iteration (s): 0.43 | learning rate: 2.737E-05 | global batch size: 256 | lm loss: 2.244594E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.689 | TFLOPs: 31.04 | 7: iteration 100420/ 115203 | consumed samples: 25707520 | consumed tokens: 52649000960 | elapsed time per iteration (s): 0.43 | learning rate: 2.736E-05 | global batch size: 256 | lm loss: 2.233109E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.969 | TFLOPs: 31.37 | 7: iteration 100430/ 115203 | consumed samples: 25710080 | consumed tokens: 52654243840 | elapsed time per iteration (s): 0.43 | learning rate: 2.735E-05 | global batch size: 256 | lm loss: 2.242810E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.445 | TFLOPs: 31.19 | 7: iteration 100440/ 115203 | consumed samples: 25712640 | consumed tokens: 52659486720 | elapsed time per iteration (s): 0.43 | learning rate: 2.734E-05 | global batch size: 256 | lm loss: 2.213034E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.904 | TFLOPs: 30.95 | 7: iteration 100450/ 115203 | consumed samples: 25715200 | consumed tokens: 52664729600 | elapsed time per iteration (s): 0.43 | learning rate: 2.733E-05 | global batch size: 256 | lm loss: 2.226372E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.717 | TFLOPs: 31.15 | 7: iteration 100460/ 115203 | consumed samples: 25717760 | consumed tokens: 52669972480 | elapsed time per iteration (s): 0.43 | learning rate: 2.732E-05 | global batch size: 256 | lm loss: 2.231166E+00 | grad norm: 0.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.459 | TFLOPs: 31.03 | 7: iteration 100470/ 115203 | consumed samples: 25720320 | consumed tokens: 52675215360 | elapsed time per iteration (s): 0.44 | learning rate: 2.731E-05 | global batch size: 256 | lm loss: 2.207728E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.944 | TFLOPs: 30.22 | 7: iteration 100480/ 115203 | consumed samples: 25722880 | consumed tokens: 52680458240 | elapsed time per iteration (s): 0.43 | learning rate: 2.730E-05 | global batch size: 256 | lm loss: 2.217428E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.547 | TFLOPs: 31.30 | 7: iteration 100490/ 115203 | consumed samples: 25725440 | consumed tokens: 52685701120 | elapsed time per iteration (s): 0.43 | learning rate: 2.729E-05 | global batch size: 256 | lm loss: 2.243768E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.936 | TFLOPs: 30.90 | 7: iteration 100500/ 115203 | consumed samples: 25728000 | consumed tokens: 52690944000 | elapsed time per iteration (s): 0.44 | learning rate: 2.728E-05 | global batch size: 256 | lm loss: 2.205072E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.963 | TFLOPs: 30.59 | 7: iteration 100510/ 115203 | consumed samples: 25730560 | consumed tokens: 52696186880 | elapsed time per iteration (s): 0.42 | learning rate: 2.727E-05 | global batch size: 256 | lm loss: 2.241818E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.904 | TFLOPs: 31.63 | 7: iteration 100520/ 115203 | consumed samples: 25733120 | consumed tokens: 52701429760 | elapsed time per iteration (s): 0.42 | learning rate: 2.726E-05 | global batch size: 256 | lm loss: 2.250358E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.458 | TFLOPs: 31.61 | 7: iteration 100530/ 115203 | consumed samples: 25735680 | consumed tokens: 52706672640 | elapsed time per iteration (s): 0.42 | learning rate: 2.725E-05 | global batch size: 256 | lm loss: 2.226380E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.576 | TFLOPs: 31.62 | 7: iteration 100540/ 115203 | consumed samples: 25738240 | consumed tokens: 52711915520 | elapsed time per iteration (s): 0.42 | learning rate: 2.724E-05 | global batch size: 256 | lm loss: 2.232624E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.137 | TFLOPs: 31.80 | 7: iteration 100550/ 115203 | consumed samples: 25740800 | consumed tokens: 52717158400 | elapsed time per iteration (s): 0.43 | learning rate: 2.723E-05 | global batch size: 256 | lm loss: 2.212539E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.267 | TFLOPs: 31.08 | 7: iteration 100560/ 115203 | consumed samples: 25743360 | consumed tokens: 52722401280 | elapsed time per iteration (s): 0.43 | learning rate: 2.722E-05 | global batch size: 256 | lm loss: 2.220585E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.098 | TFLOPs: 31.22 | 7: iteration 100570/ 115203 | consumed samples: 25745920 | consumed tokens: 52727644160 | elapsed time per iteration (s): 0.43 | learning rate: 2.721E-05 | global batch size: 256 | lm loss: 2.218020E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.515 | TFLOPs: 31.25 | 7: iteration 100580/ 115203 | consumed samples: 25748480 | consumed tokens: 52732887040 | elapsed time per iteration (s): 0.43 | learning rate: 2.720E-05 | global batch size: 256 | lm loss: 2.225401E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.775 | TFLOPs: 31.26 | 7: iteration 100590/ 115203 | consumed samples: 25751040 | consumed tokens: 52738129920 | elapsed time per iteration (s): 0.42 | learning rate: 2.719E-05 | global batch size: 256 | lm loss: 2.219044E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.427 | TFLOPs: 31.71 | 7: iteration 100600/ 115203 | consumed samples: 25753600 | consumed tokens: 52743372800 | elapsed time per iteration (s): 0.45 | learning rate: 2.718E-05 | global batch size: 256 | lm loss: 2.219484E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.680 | TFLOPs: 30.15 | 7: iteration 100610/ 115203 | consumed samples: 25756160 | consumed tokens: 52748615680 | elapsed time per iteration (s): 0.43 | learning rate: 2.717E-05 | global batch size: 256 | lm loss: 2.245014E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.833 | TFLOPs: 30.95 | 7: iteration 100620/ 115203 | consumed samples: 25758720 | consumed tokens: 52753858560 | elapsed time per iteration (s): 0.45 | learning rate: 2.716E-05 | global batch size: 256 | lm loss: 2.213690E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.681 | TFLOPs: 29.89 | 7: iteration 100630/ 115203 | consumed samples: 25761280 | consumed tokens: 52759101440 | elapsed time per iteration (s): 0.43 | learning rate: 2.716E-05 | global batch size: 256 | lm loss: 2.248261E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.971 | TFLOPs: 31.06 | 7: iteration 100640/ 115203 | consumed samples: 25763840 | consumed tokens: 52764344320 | elapsed time per iteration (s): 0.43 | learning rate: 2.715E-05 | global batch size: 256 | lm loss: 2.233678E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.187 | TFLOPs: 31.12 | 7: iteration 100650/ 115203 | consumed samples: 25766400 | consumed tokens: 52769587200 | elapsed time per iteration (s): 0.43 | learning rate: 2.714E-05 | global batch size: 256 | lm loss: 2.200210E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.216 | TFLOPs: 31.23 | 7: iteration 100660/ 115203 | consumed samples: 25768960 | consumed tokens: 52774830080 | elapsed time per iteration (s): 0.43 | learning rate: 2.713E-05 | global batch size: 256 | lm loss: 2.228940E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.060 | TFLOPs: 30.96 | 7: iteration 100670/ 115203 | consumed samples: 25771520 | consumed tokens: 52780072960 | elapsed time per iteration (s): 0.42 | learning rate: 2.712E-05 | global batch size: 256 | lm loss: 2.219400E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.501 | TFLOPs: 31.66 | 7: iteration 100680/ 115203 | consumed samples: 25774080 | consumed tokens: 52785315840 | elapsed time per iteration (s): 0.43 | learning rate: 2.711E-05 | global batch size: 256 | lm loss: 2.241341E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.153 | TFLOPs: 31.44 | 7: iteration 100690/ 115203 | consumed samples: 25776640 | consumed tokens: 52790558720 | elapsed time per iteration (s): 0.43 | learning rate: 2.710E-05 | global batch size: 256 | lm loss: 2.252487E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.827 | TFLOPs: 31.37 | 7: iteration 100700/ 115203 | consumed samples: 25779200 | consumed tokens: 52795801600 | elapsed time per iteration (s): 0.43 | learning rate: 2.709E-05 | global batch size: 256 | lm loss: 2.204420E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.460 | TFLOPs: 31.35 | 7: iteration 100710/ 115203 | consumed samples: 25781760 | consumed tokens: 52801044480 | elapsed time per iteration (s): 0.43 | learning rate: 2.708E-05 | global batch size: 256 | lm loss: 2.200678E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.393 | TFLOPs: 31.55 | 7: iteration 100720/ 115203 | consumed samples: 25784320 | consumed tokens: 52806287360 | elapsed time per iteration (s): 0.43 | learning rate: 2.707E-05 | global batch size: 256 | lm loss: 2.205463E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.864 | TFLOPs: 31.05 | 7: iteration 100730/ 115203 | consumed samples: 25786880 | consumed tokens: 52811530240 | elapsed time per iteration (s): 0.44 | learning rate: 2.706E-05 | global batch size: 256 | lm loss: 2.224921E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.305 | TFLOPs: 30.29 | 7: iteration 100740/ 115203 | consumed samples: 25789440 | consumed tokens: 52816773120 | elapsed time per iteration (s): 0.42 | learning rate: 2.705E-05 | global batch size: 256 | lm loss: 2.209808E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.982 | TFLOPs: 31.69 | 7: iteration 100750/ 115203 | consumed samples: 25792000 | consumed tokens: 52822016000 | elapsed time per iteration (s): 0.44 | learning rate: 2.704E-05 | global batch size: 256 | lm loss: 2.236167E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.595 | TFLOPs: 30.57 | 7: iteration 100760/ 115203 | consumed samples: 25794560 | consumed tokens: 52827258880 | elapsed time per iteration (s): 0.44 | learning rate: 2.703E-05 | global batch size: 256 | lm loss: 2.220669E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.369 | TFLOPs: 30.61 | 7: iteration 100770/ 115203 | consumed samples: 25797120 | consumed tokens: 52832501760 | elapsed time per iteration (s): 0.43 | learning rate: 2.702E-05 | global batch size: 256 | lm loss: 2.228171E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.762 | TFLOPs: 31.15 | 7: iteration 100780/ 115203 | consumed samples: 25799680 | consumed tokens: 52837744640 | elapsed time per iteration (s): 0.44 | learning rate: 2.701E-05 | global batch size: 256 | lm loss: 2.187791E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.027 | TFLOPs: 30.38 | 7: iteration 100790/ 115203 | consumed samples: 25802240 | consumed tokens: 52842987520 | elapsed time per iteration (s): 0.43 | learning rate: 2.700E-05 | global batch size: 256 | lm loss: 2.202438E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.148 | TFLOPs: 31.12 | 7: iteration 100800/ 115203 | consumed samples: 25804800 | consumed tokens: 52848230400 | elapsed time per iteration (s): 0.44 | learning rate: 2.699E-05 | global batch size: 256 | lm loss: 2.208427E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.551 | TFLOPs: 30.20 | 7: iteration 100810/ 115203 | consumed samples: 25807360 | consumed tokens: 52853473280 | elapsed time per iteration (s): 0.43 | learning rate: 2.698E-05 | global batch size: 256 | lm loss: 2.231954E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.854 | TFLOPs: 31.11 | 7: iteration 100820/ 115203 | consumed samples: 25809920 | consumed tokens: 52858716160 | elapsed time per iteration (s): 0.46 | learning rate: 2.697E-05 | global batch size: 256 | lm loss: 2.229847E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.180 | TFLOPs: 29.34 | 7: iteration 100830/ 115203 | consumed samples: 25812480 | consumed tokens: 52863959040 | elapsed time per iteration (s): 0.43 | learning rate: 2.696E-05 | global batch size: 256 | lm loss: 2.214316E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.965 | TFLOPs: 31.06 | 7: iteration 100840/ 115203 | consumed samples: 25815040 | consumed tokens: 52869201920 | elapsed time per iteration (s): 0.45 | learning rate: 2.695E-05 | global batch size: 256 | lm loss: 2.194826E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.638 | TFLOPs: 30.15 | 7: iteration 100850/ 115203 | consumed samples: 25817600 | consumed tokens: 52874444800 | elapsed time per iteration (s): 0.44 | learning rate: 2.694E-05 | global batch size: 256 | lm loss: 2.241254E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.271 | TFLOPs: 30.24 | 7: iteration 100860/ 115203 | consumed samples: 25820160 | consumed tokens: 52879687680 | elapsed time per iteration (s): 0.43 | learning rate: 2.693E-05 | global batch size: 256 | lm loss: 2.239116E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.142 | TFLOPs: 31.07 | 7: iteration 100870/ 115203 | consumed samples: 25822720 | consumed tokens: 52884930560 | elapsed time per iteration (s): 0.44 | learning rate: 2.692E-05 | global batch size: 256 | lm loss: 2.230305E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.758 | TFLOPs: 30.84 | 7: iteration 100880/ 115203 | consumed samples: 25825280 | consumed tokens: 52890173440 | elapsed time per iteration (s): 0.44 | learning rate: 2.691E-05 | global batch size: 256 | lm loss: 2.234527E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.919 | TFLOPs: 30.22 | 7: iteration 100890/ 115203 | consumed samples: 25827840 | consumed tokens: 52895416320 | elapsed time per iteration (s): 0.45 | learning rate: 2.691E-05 | global batch size: 256 | lm loss: 2.209403E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.745 | TFLOPs: 29.95 | 7: iteration 100900/ 115203 | consumed samples: 25830400 | consumed tokens: 52900659200 | elapsed time per iteration (s): 0.46 | learning rate: 2.690E-05 | global batch size: 256 | lm loss: 2.219061E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.227 | TFLOPs: 29.08 | 7: iteration 100910/ 115203 | consumed samples: 25832960 | consumed tokens: 52905902080 | elapsed time per iteration (s): 0.43 | learning rate: 2.689E-05 | global batch size: 256 | lm loss: 2.226496E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.428 | TFLOPs: 31.08 | 7: iteration 100920/ 115203 | consumed samples: 25835520 | consumed tokens: 52911144960 | elapsed time per iteration (s): 0.59 | learning rate: 2.688E-05 | global batch size: 256 | lm loss: 2.222986E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 434.275 | TFLOPs: 22.79 | 7: iteration 100930/ 115203 | consumed samples: 25838080 | consumed tokens: 52916387840 | elapsed time per iteration (s): 0.43 | learning rate: 2.687E-05 | global batch size: 256 | lm loss: 2.218053E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.208 | TFLOPs: 31.33 | 7: iteration 100940/ 115203 | consumed samples: 25840640 | consumed tokens: 52921630720 | elapsed time per iteration (s): 0.46 | learning rate: 2.686E-05 | global batch size: 256 | lm loss: 2.229640E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.910 | TFLOPs: 29.48 | 7: iteration 100950/ 115203 | consumed samples: 25843200 | consumed tokens: 52926873600 | elapsed time per iteration (s): 0.43 | learning rate: 2.685E-05 | global batch size: 256 | lm loss: 2.213236E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.340 | TFLOPs: 31.03 | 7: iteration 100960/ 115203 | consumed samples: 25845760 | consumed tokens: 52932116480 | elapsed time per iteration (s): 0.44 | learning rate: 2.684E-05 | global batch size: 256 | lm loss: 2.228604E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.316 | TFLOPs: 30.82 | 7: iteration 100970/ 115203 | consumed samples: 25848320 | consumed tokens: 52937359360 | elapsed time per iteration (s): 0.44 | learning rate: 2.683E-05 | global batch size: 256 | lm loss: 2.219349E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.979 | TFLOPs: 30.69 | 7: iteration 100980/ 115203 | consumed samples: 25850880 | consumed tokens: 52942602240 | elapsed time per iteration (s): 0.44 | learning rate: 2.682E-05 | global batch size: 256 | lm loss: 2.217998E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.276 | TFLOPs: 30.39 | 7: iteration 100990/ 115203 | consumed samples: 25853440 | consumed tokens: 52947845120 | elapsed time per iteration (s): 0.49 | learning rate: 2.681E-05 | global batch size: 256 | lm loss: 2.201970E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 523.516 | TFLOPs: 27.47 | 7: iteration 101000/ 115203 | consumed samples: 25856000 | consumed tokens: 52953088000 | elapsed time per iteration (s): 0.44 | learning rate: 2.680E-05 | global batch size: 256 | lm loss: 2.191322E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.480 | TFLOPs: 30.61 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 101000 | lm loss value: 2.274921E+00 | lm loss PPL: 9.727155E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 101000 to checkpoints_221m 0: [2022-11-29 01:08:24,133] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step101000 is begin to save! 0: [2022-11-29 01:08:24,186] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:08:24,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:08:24,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:08:24,395] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:08:24,396] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:08:24,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:08:24,420] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:08:24,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:08:24,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:08:24,468] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:08:24,468] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:08:24,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:08:24,492] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:08:24,516] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:08:24,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:08:24,539] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:08:24,539] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:08:24,562] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:08:24,563] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:08:24,587] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:08:24,588] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:08:24,612] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:08:24,612] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:08:24,635] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:08:24,636] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:08:24,659] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:08:24,659] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:08:24,683] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:08:24,683] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:08:24,706] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:08:24,707] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:08:24,730] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:08:24,730] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:08:24,752] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:08:24,753] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:08:24,775] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:08:24,775] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:08:24,798] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:08:24,798] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:08:24,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:08:24,804] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step101000/mp_rank_00_model_states.pt 0: [2022-11-29 01:08:24,804] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:08:24,806] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:08:24,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 2: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:08:24,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 7: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 1: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 6: [2022-11-29 01:08:24,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:08:24,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:08:24,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,876] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,876] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,876] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 3: [2022-11-29 01:08:24,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:08:24,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:08:24,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:08:24,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 4: [2022-11-29 01:08:24,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 5: [2022-11-29 01:08:24,921] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:08:24,922] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:08:24,922] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:08:24,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,932] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,932] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: [2022-11-29 01:08:24,941] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step101000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:08:24,941] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step101000 is ready now! 0: successfully saved checkpoint at iteration 101000 to checkpoints_221m 7: time (ms) | save-checkpoint: 843.08 7: iteration 101010/ 115203 | consumed samples: 25858560 | consumed tokens: 52958330880 | elapsed time per iteration (s): 0.55 | learning rate: 2.679E-05 | global batch size: 256 | lm loss: 2.229296E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 464.354 | TFLOPs: 24.36 | 7: iteration 101020/ 115203 | consumed samples: 25861120 | consumed tokens: 52963573760 | elapsed time per iteration (s): 0.44 | learning rate: 2.678E-05 | global batch size: 256 | lm loss: 2.209302E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.362 | TFLOPs: 30.35 | 7: iteration 101030/ 115203 | consumed samples: 25863680 | consumed tokens: 52968816640 | elapsed time per iteration (s): 0.43 | learning rate: 2.677E-05 | global batch size: 256 | lm loss: 2.238842E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.587 | TFLOPs: 30.88 | 7: iteration 101040/ 115203 | consumed samples: 25866240 | consumed tokens: 52974059520 | elapsed time per iteration (s): 0.44 | learning rate: 2.676E-05 | global batch size: 256 | lm loss: 2.231123E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.155 | TFLOPs: 30.65 | 7: iteration 101050/ 115203 | consumed samples: 25868800 | consumed tokens: 52979302400 | elapsed time per iteration (s): 0.43 | learning rate: 2.675E-05 | global batch size: 256 | lm loss: 2.226455E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.805 | TFLOPs: 31.16 | 7: iteration 101060/ 115203 | consumed samples: 25871360 | consumed tokens: 52984545280 | elapsed time per iteration (s): 0.44 | learning rate: 2.674E-05 | global batch size: 256 | lm loss: 2.180376E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.963 | TFLOPs: 30.80 | 7: iteration 101070/ 115203 | consumed samples: 25873920 | consumed tokens: 52989788160 | elapsed time per iteration (s): 0.46 | learning rate: 2.673E-05 | global batch size: 256 | lm loss: 2.211929E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.335 | TFLOPs: 29.19 | 7: iteration 101080/ 115203 | consumed samples: 25876480 | consumed tokens: 52995031040 | elapsed time per iteration (s): 0.44 | learning rate: 2.673E-05 | global batch size: 256 | lm loss: 2.249286E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.683 | TFLOPs: 30.83 | 7: iteration 101090/ 115203 | consumed samples: 25879040 | consumed tokens: 53000273920 | elapsed time per iteration (s): 0.43 | learning rate: 2.672E-05 | global batch size: 256 | lm loss: 2.208781E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.681 | TFLOPs: 31.36 | 7: iteration 101100/ 115203 | consumed samples: 25881600 | consumed tokens: 53005516800 | elapsed time per iteration (s): 0.43 | learning rate: 2.671E-05 | global batch size: 256 | lm loss: 2.203909E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.757 | TFLOPs: 31.10 | 7: iteration 101110/ 115203 | consumed samples: 25884160 | consumed tokens: 53010759680 | elapsed time per iteration (s): 0.44 | learning rate: 2.670E-05 | global batch size: 256 | lm loss: 2.232403E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.099 | TFLOPs: 30.86 | 7: iteration 101120/ 115203 | consumed samples: 25886720 | consumed tokens: 53016002560 | elapsed time per iteration (s): 0.43 | learning rate: 2.669E-05 | global batch size: 256 | lm loss: 2.187178E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.530 | TFLOPs: 31.14 | 7: iteration 101130/ 115203 | consumed samples: 25889280 | consumed tokens: 53021245440 | elapsed time per iteration (s): 0.43 | learning rate: 2.668E-05 | global batch size: 256 | lm loss: 2.216452E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.083 | TFLOPs: 31.17 | 7: iteration 101140/ 115203 | consumed samples: 25891840 | consumed tokens: 53026488320 | elapsed time per iteration (s): 0.43 | learning rate: 2.667E-05 | global batch size: 256 | lm loss: 2.218021E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.968 | TFLOPs: 31.06 | 7: iteration 101150/ 115203 | consumed samples: 25894400 | consumed tokens: 53031731200 | elapsed time per iteration (s): 0.44 | learning rate: 2.666E-05 | global batch size: 256 | lm loss: 2.236155E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.385 | TFLOPs: 30.66 | 7: iteration 101160/ 115203 | consumed samples: 25896960 | consumed tokens: 53036974080 | elapsed time per iteration (s): 0.43 | learning rate: 2.665E-05 | global batch size: 256 | lm loss: 2.193216E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.139 | TFLOPs: 31.59 | 7: iteration 101170/ 115203 | consumed samples: 25899520 | consumed tokens: 53042216960 | elapsed time per iteration (s): 0.43 | learning rate: 2.664E-05 | global batch size: 256 | lm loss: 2.223259E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.683 | TFLOPs: 31.46 | 7: iteration 101180/ 115203 | consumed samples: 25902080 | consumed tokens: 53047459840 | elapsed time per iteration (s): 0.43 | learning rate: 2.663E-05 | global batch size: 256 | lm loss: 2.226663E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.510 | TFLOPs: 31.14 | 7: iteration 101190/ 115203 | consumed samples: 25904640 | consumed tokens: 53052702720 | elapsed time per iteration (s): 0.43 | learning rate: 2.662E-05 | global batch size: 256 | lm loss: 2.205902E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.846 | TFLOPs: 31.00 | 7: iteration 101200/ 115203 | consumed samples: 25907200 | consumed tokens: 53057945600 | elapsed time per iteration (s): 0.43 | learning rate: 2.661E-05 | global batch size: 256 | lm loss: 2.229385E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.573 | TFLOPs: 31.09 | 7: iteration 101210/ 115203 | consumed samples: 25909760 | consumed tokens: 53063188480 | elapsed time per iteration (s): 0.44 | learning rate: 2.660E-05 | global batch size: 256 | lm loss: 2.240284E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.328 | TFLOPs: 30.66 | 7: iteration 101220/ 115203 | consumed samples: 25912320 | consumed tokens: 53068431360 | elapsed time per iteration (s): 0.43 | learning rate: 2.659E-05 | global batch size: 256 | lm loss: 2.194912E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.399 | TFLOPs: 31.13 | 7: iteration 101230/ 115203 | consumed samples: 25914880 | consumed tokens: 53073674240 | elapsed time per iteration (s): 0.44 | learning rate: 2.659E-05 | global batch size: 256 | lm loss: 2.210976E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.757 | TFLOPs: 30.73 | 7: iteration 101240/ 115203 | consumed samples: 25917440 | consumed tokens: 53078917120 | elapsed time per iteration (s): 0.42 | learning rate: 2.658E-05 | global batch size: 256 | lm loss: 2.234029E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.415 | TFLOPs: 31.66 | 7: iteration 101250/ 115203 | consumed samples: 25920000 | consumed tokens: 53084160000 | elapsed time per iteration (s): 0.43 | learning rate: 2.657E-05 | global batch size: 256 | lm loss: 2.208222E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.109 | TFLOPs: 31.54 | 7: iteration 101260/ 115203 | consumed samples: 25922560 | consumed tokens: 53089402880 | elapsed time per iteration (s): 0.43 | learning rate: 2.656E-05 | global batch size: 256 | lm loss: 2.216820E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.311 | TFLOPs: 31.18 | 7: iteration 101270/ 115203 | consumed samples: 25925120 | consumed tokens: 53094645760 | elapsed time per iteration (s): 0.44 | learning rate: 2.655E-05 | global batch size: 256 | lm loss: 2.196549E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.549 | TFLOPs: 30.46 | 7: iteration 101280/ 115203 | consumed samples: 25927680 | consumed tokens: 53099888640 | elapsed time per iteration (s): 0.45 | learning rate: 2.654E-05 | global batch size: 256 | lm loss: 2.213638E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.954 | TFLOPs: 29.64 | 7: iteration 101290/ 115203 | consumed samples: 25930240 | consumed tokens: 53105131520 | elapsed time per iteration (s): 0.43 | learning rate: 2.653E-05 | global batch size: 256 | lm loss: 2.253120E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.422 | TFLOPs: 31.03 | 7: iteration 101300/ 115203 | consumed samples: 25932800 | consumed tokens: 53110374400 | elapsed time per iteration (s): 0.44 | learning rate: 2.652E-05 | global batch size: 256 | lm loss: 2.216268E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.787 | TFLOPs: 30.26 | 7: iteration 101310/ 115203 | consumed samples: 25935360 | consumed tokens: 53115617280 | elapsed time per iteration (s): 0.43 | learning rate: 2.651E-05 | global batch size: 256 | lm loss: 2.203572E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.609 | TFLOPs: 31.51 | 7: iteration 101320/ 115203 | consumed samples: 25937920 | consumed tokens: 53120860160 | elapsed time per iteration (s): 0.43 | learning rate: 2.650E-05 | global batch size: 256 | lm loss: 2.214767E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.096 | TFLOPs: 30.96 | 7: iteration 101330/ 115203 | consumed samples: 25940480 | consumed tokens: 53126103040 | elapsed time per iteration (s): 0.43 | learning rate: 2.649E-05 | global batch size: 256 | lm loss: 2.253325E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.272 | TFLOPs: 31.08 | 7: iteration 101340/ 115203 | consumed samples: 25943040 | consumed tokens: 53131345920 | elapsed time per iteration (s): 0.43 | learning rate: 2.648E-05 | global batch size: 256 | lm loss: 2.254527E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.726 | TFLOPs: 31.41 | 7: iteration 101350/ 115203 | consumed samples: 25945600 | consumed tokens: 53136588800 | elapsed time per iteration (s): 0.44 | learning rate: 2.647E-05 | global batch size: 256 | lm loss: 2.225448E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.688 | TFLOPs: 30.68 | 7: iteration 101360/ 115203 | consumed samples: 25948160 | consumed tokens: 53141831680 | elapsed time per iteration (s): 0.43 | learning rate: 2.646E-05 | global batch size: 256 | lm loss: 2.223704E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.535 | TFLOPs: 31.35 | 7: iteration 101370/ 115203 | consumed samples: 25950720 | consumed tokens: 53147074560 | elapsed time per iteration (s): 0.43 | learning rate: 2.646E-05 | global batch size: 256 | lm loss: 2.224867E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.908 | TFLOPs: 31.16 | 7: iteration 101380/ 115203 | consumed samples: 25953280 | consumed tokens: 53152317440 | elapsed time per iteration (s): 0.44 | learning rate: 2.645E-05 | global batch size: 256 | lm loss: 2.223010E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.509 | TFLOPs: 30.62 | 7: iteration 101390/ 115203 | consumed samples: 25955840 | consumed tokens: 53157560320 | elapsed time per iteration (s): 0.43 | learning rate: 2.644E-05 | global batch size: 256 | lm loss: 2.218793E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.555 | TFLOPs: 31.35 | 7: iteration 101400/ 115203 | consumed samples: 25958400 | consumed tokens: 53162803200 | elapsed time per iteration (s): 0.43 | learning rate: 2.643E-05 | global batch size: 256 | lm loss: 2.224084E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.945 | TFLOPs: 31.22 | 7: iteration 101410/ 115203 | consumed samples: 25960960 | consumed tokens: 53168046080 | elapsed time per iteration (s): 0.44 | learning rate: 2.642E-05 | global batch size: 256 | lm loss: 2.221418E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.949 | TFLOPs: 30.64 | 7: iteration 101420/ 115203 | consumed samples: 25963520 | consumed tokens: 53173288960 | elapsed time per iteration (s): 0.43 | learning rate: 2.641E-05 | global batch size: 256 | lm loss: 2.250553E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.271 | TFLOPs: 31.29 | 7: iteration 101430/ 115203 | consumed samples: 25966080 | consumed tokens: 53178531840 | elapsed time per iteration (s): 0.43 | learning rate: 2.640E-05 | global batch size: 256 | lm loss: 2.231288E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.585 | TFLOPs: 30.93 | 7: iteration 101440/ 115203 | consumed samples: 25968640 | consumed tokens: 53183774720 | elapsed time per iteration (s): 0.44 | learning rate: 2.639E-05 | global batch size: 256 | lm loss: 2.225198E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.551 | TFLOPs: 30.78 | 7: iteration 101450/ 115203 | consumed samples: 25971200 | consumed tokens: 53189017600 | elapsed time per iteration (s): 0.44 | learning rate: 2.638E-05 | global batch size: 256 | lm loss: 2.199416E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.398 | TFLOPs: 30.66 | 7: iteration 101460/ 115203 | consumed samples: 25973760 | consumed tokens: 53194260480 | elapsed time per iteration (s): 0.45 | learning rate: 2.637E-05 | global batch size: 256 | lm loss: 2.221142E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.897 | TFLOPs: 30.16 | 7: iteration 101470/ 115203 | consumed samples: 25976320 | consumed tokens: 53199503360 | elapsed time per iteration (s): 0.44 | learning rate: 2.636E-05 | global batch size: 256 | lm loss: 2.234409E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.491 | TFLOPs: 30.88 | 7: iteration 101480/ 115203 | consumed samples: 25978880 | consumed tokens: 53204746240 | elapsed time per iteration (s): 0.44 | learning rate: 2.635E-05 | global batch size: 256 | lm loss: 2.201014E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.071 | TFLOPs: 30.65 | 7: iteration 101490/ 115203 | consumed samples: 25981440 | consumed tokens: 53209989120 | elapsed time per iteration (s): 0.43 | learning rate: 2.635E-05 | global batch size: 256 | lm loss: 2.225845E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.057 | TFLOPs: 30.96 | 7: iteration 101500/ 115203 | consumed samples: 25984000 | consumed tokens: 53215232000 | elapsed time per iteration (s): 0.44 | learning rate: 2.634E-05 | global batch size: 256 | lm loss: 2.217584E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.600 | TFLOPs: 30.73 | 7: iteration 101510/ 115203 | consumed samples: 25986560 | consumed tokens: 53220474880 | elapsed time per iteration (s): 0.44 | learning rate: 2.633E-05 | global batch size: 256 | lm loss: 2.231772E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.273 | TFLOPs: 30.71 | 7: iteration 101520/ 115203 | consumed samples: 25989120 | consumed tokens: 53225717760 | elapsed time per iteration (s): 0.45 | learning rate: 2.632E-05 | global batch size: 256 | lm loss: 2.206045E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.441 | TFLOPs: 29.88 | 7: iteration 101530/ 115203 | consumed samples: 25991680 | consumed tokens: 53230960640 | elapsed time per iteration (s): 0.43 | learning rate: 2.631E-05 | global batch size: 256 | lm loss: 2.202183E+00 | grad norm: 0.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.090 | TFLOPs: 31.22 | 7: iteration 101540/ 115203 | consumed samples: 25994240 | consumed tokens: 53236203520 | elapsed time per iteration (s): 0.43 | learning rate: 2.630E-05 | global batch size: 256 | lm loss: 2.226045E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.279 | TFLOPs: 31.55 | 7: iteration 101550/ 115203 | consumed samples: 25996800 | consumed tokens: 53241446400 | elapsed time per iteration (s): 0.43 | learning rate: 2.629E-05 | global batch size: 256 | lm loss: 2.217766E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.992 | TFLOPs: 31.06 | 7: iteration 101560/ 115203 | consumed samples: 25999360 | consumed tokens: 53246689280 | elapsed time per iteration (s): 0.43 | learning rate: 2.628E-05 | global batch size: 256 | lm loss: 2.225595E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.692 | TFLOPs: 31.31 | 7: iteration 101570/ 115203 | consumed samples: 26001920 | consumed tokens: 53251932160 | elapsed time per iteration (s): 0.43 | learning rate: 2.627E-05 | global batch size: 256 | lm loss: 2.220637E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.007 | TFLOPs: 31.22 | 7: iteration 101580/ 115203 | consumed samples: 26004480 | consumed tokens: 53257175040 | elapsed time per iteration (s): 0.43 | learning rate: 2.626E-05 | global batch size: 256 | lm loss: 2.215025E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.125 | TFLOPs: 31.28 | 7: iteration 101590/ 115203 | consumed samples: 26007040 | consumed tokens: 53262417920 | elapsed time per iteration (s): 0.42 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 2.210094E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.820 | TFLOPs: 31.73 | 7: iteration 101600/ 115203 | consumed samples: 26009600 | consumed tokens: 53267660800 | elapsed time per iteration (s): 0.43 | learning rate: 2.625E-05 | global batch size: 256 | lm loss: 2.205213E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.474 | TFLOPs: 31.40 | 7: iteration 101610/ 115203 | consumed samples: 26012160 | consumed tokens: 53272903680 | elapsed time per iteration (s): 0.43 | learning rate: 2.624E-05 | global batch size: 256 | lm loss: 2.232976E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.208 | TFLOPs: 31.18 | 7: iteration 101620/ 115203 | consumed samples: 26014720 | consumed tokens: 53278146560 | elapsed time per iteration (s): 0.44 | learning rate: 2.623E-05 | global batch size: 256 | lm loss: 2.178019E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.926 | TFLOPs: 30.85 | 7: iteration 101630/ 115203 | consumed samples: 26017280 | consumed tokens: 53283389440 | elapsed time per iteration (s): 0.46 | learning rate: 2.622E-05 | global batch size: 256 | lm loss: 2.218888E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.564 | TFLOPs: 29.25 | 7: iteration 101640/ 115203 | consumed samples: 26019840 | consumed tokens: 53288632320 | elapsed time per iteration (s): 0.45 | learning rate: 2.621E-05 | global batch size: 256 | lm loss: 2.243970E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.235 | TFLOPs: 29.87 | 7: iteration 101650/ 115203 | consumed samples: 26022400 | consumed tokens: 53293875200 | elapsed time per iteration (s): 0.47 | learning rate: 2.620E-05 | global batch size: 256 | lm loss: 2.238668E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 550.256 | TFLOPs: 28.87 | 7: iteration 101660/ 115203 | consumed samples: 26024960 | consumed tokens: 53299118080 | elapsed time per iteration (s): 0.43 | learning rate: 2.619E-05 | global batch size: 256 | lm loss: 2.221822E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.687 | TFLOPs: 31.46 | 7: iteration 101670/ 115203 | consumed samples: 26027520 | consumed tokens: 53304360960 | elapsed time per iteration (s): 0.45 | learning rate: 2.618E-05 | global batch size: 256 | lm loss: 2.218773E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.927 | TFLOPs: 30.06 | 7: iteration 101680/ 115203 | consumed samples: 26030080 | consumed tokens: 53309603840 | elapsed time per iteration (s): 0.44 | learning rate: 2.617E-05 | global batch size: 256 | lm loss: 2.216832E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.993 | TFLOPs: 30.75 | 7: iteration 101690/ 115203 | consumed samples: 26032640 | consumed tokens: 53314846720 | elapsed time per iteration (s): 0.44 | learning rate: 2.616E-05 | global batch size: 256 | lm loss: 2.233180E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.736 | TFLOPs: 30.68 | 7: iteration 101700/ 115203 | consumed samples: 26035200 | consumed tokens: 53320089600 | elapsed time per iteration (s): 0.43 | learning rate: 2.615E-05 | global batch size: 256 | lm loss: 2.239878E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.705 | TFLOPs: 31.15 | 7: iteration 101710/ 115203 | consumed samples: 26037760 | consumed tokens: 53325332480 | elapsed time per iteration (s): 0.43 | learning rate: 2.615E-05 | global batch size: 256 | lm loss: 2.188638E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.027 | TFLOPs: 31.59 | 7: iteration 101720/ 115203 | consumed samples: 26040320 | consumed tokens: 53330575360 | elapsed time per iteration (s): 0.44 | learning rate: 2.614E-05 | global batch size: 256 | lm loss: 2.218005E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.514 | TFLOPs: 30.83 | 7: iteration 101730/ 115203 | consumed samples: 26042880 | consumed tokens: 53335818240 | elapsed time per iteration (s): 0.43 | learning rate: 2.613E-05 | global batch size: 256 | lm loss: 2.218963E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.841 | TFLOPs: 30.90 | 7: iteration 101740/ 115203 | consumed samples: 26045440 | consumed tokens: 53341061120 | elapsed time per iteration (s): 0.42 | learning rate: 2.612E-05 | global batch size: 256 | lm loss: 2.241779E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.871 | TFLOPs: 31.84 | 7: iteration 101750/ 115203 | consumed samples: 26048000 | consumed tokens: 53346304000 | elapsed time per iteration (s): 0.43 | learning rate: 2.611E-05 | global batch size: 256 | lm loss: 2.197334E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.693 | TFLOPs: 31.31 | 7: iteration 101760/ 115203 | consumed samples: 26050560 | consumed tokens: 53351546880 | elapsed time per iteration (s): 0.43 | learning rate: 2.610E-05 | global batch size: 256 | lm loss: 2.238987E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.150 | TFLOPs: 31.59 | 7: iteration 101770/ 115203 | consumed samples: 26053120 | consumed tokens: 53356789760 | elapsed time per iteration (s): 0.44 | learning rate: 2.609E-05 | global batch size: 256 | lm loss: 2.219043E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.789 | TFLOPs: 30.79 | 7: iteration 101780/ 115203 | consumed samples: 26055680 | consumed tokens: 53362032640 | elapsed time per iteration (s): 0.43 | learning rate: 2.608E-05 | global batch size: 256 | lm loss: 2.209558E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.827 | TFLOPs: 31.42 | 7: iteration 101790/ 115203 | consumed samples: 26058240 | consumed tokens: 53367275520 | elapsed time per iteration (s): 0.43 | learning rate: 2.607E-05 | global batch size: 256 | lm loss: 2.211839E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.782 | TFLOPs: 31.31 | 7: iteration 101800/ 115203 | consumed samples: 26060800 | consumed tokens: 53372518400 | elapsed time per iteration (s): 0.44 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 2.177241E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.955 | TFLOPs: 30.27 | 7: iteration 101810/ 115203 | consumed samples: 26063360 | consumed tokens: 53377761280 | elapsed time per iteration (s): 0.43 | learning rate: 2.606E-05 | global batch size: 256 | lm loss: 2.238504E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.743 | TFLOPs: 31.21 | 7: iteration 101820/ 115203 | consumed samples: 26065920 | consumed tokens: 53383004160 | elapsed time per iteration (s): 0.44 | learning rate: 2.605E-05 | global batch size: 256 | lm loss: 2.223386E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.674 | TFLOPs: 30.73 | 7: iteration 101830/ 115203 | consumed samples: 26068480 | consumed tokens: 53388247040 | elapsed time per iteration (s): 0.43 | learning rate: 2.604E-05 | global batch size: 256 | lm loss: 2.192641E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.114 | TFLOPs: 31.54 | 7: iteration 101840/ 115203 | consumed samples: 26071040 | consumed tokens: 53393489920 | elapsed time per iteration (s): 0.43 | learning rate: 2.603E-05 | global batch size: 256 | lm loss: 2.245211E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.821 | TFLOPs: 31.05 | 7: iteration 101850/ 115203 | consumed samples: 26073600 | consumed tokens: 53398732800 | elapsed time per iteration (s): 0.43 | learning rate: 2.602E-05 | global batch size: 256 | lm loss: 2.222225E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.573 | TFLOPs: 31.20 | 7: iteration 101860/ 115203 | consumed samples: 26076160 | consumed tokens: 53403975680 | elapsed time per iteration (s): 0.43 | learning rate: 2.601E-05 | global batch size: 256 | lm loss: 2.226911E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.459 | TFLOPs: 31.19 | 7: iteration 101870/ 115203 | consumed samples: 26078720 | consumed tokens: 53409218560 | elapsed time per iteration (s): 0.43 | learning rate: 2.600E-05 | global batch size: 256 | lm loss: 2.218772E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.076 | TFLOPs: 31.07 | 7: iteration 101880/ 115203 | consumed samples: 26081280 | consumed tokens: 53414461440 | elapsed time per iteration (s): 0.44 | learning rate: 2.599E-05 | global batch size: 256 | lm loss: 2.225116E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.115 | TFLOPs: 30.86 | 7: iteration 101890/ 115203 | consumed samples: 26083840 | consumed tokens: 53419704320 | elapsed time per iteration (s): 0.43 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 2.214071E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.996 | TFLOPs: 31.43 | 7: iteration 101900/ 115203 | consumed samples: 26086400 | consumed tokens: 53424947200 | elapsed time per iteration (s): 0.43 | learning rate: 2.598E-05 | global batch size: 256 | lm loss: 2.235143E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.889 | TFLOPs: 31.27 | 7: iteration 101910/ 115203 | consumed samples: 26088960 | consumed tokens: 53430190080 | elapsed time per iteration (s): 0.44 | learning rate: 2.597E-05 | global batch size: 256 | lm loss: 2.196680E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.925 | TFLOPs: 30.80 | 7: iteration 101920/ 115203 | consumed samples: 26091520 | consumed tokens: 53435432960 | elapsed time per iteration (s): 0.44 | learning rate: 2.596E-05 | global batch size: 256 | lm loss: 2.212753E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.190 | TFLOPs: 30.60 | 7: iteration 101930/ 115203 | consumed samples: 26094080 | consumed tokens: 53440675840 | elapsed time per iteration (s): 0.43 | learning rate: 2.595E-05 | global batch size: 256 | lm loss: 2.205694E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.967 | TFLOPs: 30.90 | 7: iteration 101940/ 115203 | consumed samples: 26096640 | consumed tokens: 53445918720 | elapsed time per iteration (s): 0.43 | learning rate: 2.594E-05 | global batch size: 256 | lm loss: 2.212367E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.749 | TFLOPs: 31.05 | 7: iteration 101950/ 115203 | consumed samples: 26099200 | consumed tokens: 53451161600 | elapsed time per iteration (s): 0.43 | learning rate: 2.593E-05 | global batch size: 256 | lm loss: 2.218414E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.240 | TFLOPs: 31.07 | 7: iteration 101960/ 115203 | consumed samples: 26101760 | consumed tokens: 53456404480 | elapsed time per iteration (s): 0.43 | learning rate: 2.592E-05 | global batch size: 256 | lm loss: 2.205908E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.428 | TFLOPs: 31.56 | 7: iteration 101970/ 115203 | consumed samples: 26104320 | consumed tokens: 53461647360 | elapsed time per iteration (s): 0.44 | learning rate: 2.591E-05 | global batch size: 256 | lm loss: 2.231179E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.181 | TFLOPs: 30.23 | 7: iteration 101980/ 115203 | consumed samples: 26106880 | consumed tokens: 53466890240 | elapsed time per iteration (s): 0.43 | learning rate: 2.590E-05 | global batch size: 256 | lm loss: 2.248187E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.132 | TFLOPs: 31.17 | 7: iteration 101990/ 115203 | consumed samples: 26109440 | consumed tokens: 53472133120 | elapsed time per iteration (s): 0.44 | learning rate: 2.590E-05 | global batch size: 256 | lm loss: 2.257084E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.762 | TFLOPs: 30.68 | 0: [2022-11-29 01:15:39,348] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=0, lr=[2.5887309996453706e-05, 2.5887309996453706e-05, 2.5887309996453706e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 102000/ 115203 | consumed samples: 26112000 | consumed tokens: 53477376000 | elapsed time per iteration (s): 0.43 | learning rate: 2.589E-05 | global batch size: 256 | lm loss: 2.208775E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.390 | TFLOPs: 31.29 | 0: steps: 102000 loss: 2.2041 iter time (s): 0.435 samples/sec: 589.149 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 102000 | lm loss value: 2.074200E+00 | lm loss PPL: 7.958179E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 102000 to checkpoints_221m 0: [2022-11-29 01:15:39,509] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step102000 is begin to save! 0: [2022-11-29 01:15:39,512] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:15:39,626] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:15:39,626] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:15:39,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:15:39,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:15:39,674] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:15:39,674] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:15:39,698] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:15:39,698] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:15:39,724] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:15:39,724] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:15:39,749] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:15:39,749] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:15:39,773] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:15:39,773] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:15:39,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:15:39,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:15:39,822] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:15:39,822] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:15:39,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:15:39,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:15:39,871] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:15:39,872] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:15:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:15:39,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:15:39,920] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:15:39,921] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:15:39,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:15:39,945] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:15:39,969] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:15:39,970] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:15:39,993] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:15:39,994] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:15:40,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:15:40,018] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:15:40,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:15:40,044] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:15:40,068] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:15:40,068] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:15:40,072] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:15:40,073] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step102000/mp_rank_00_model_states.pt 0: [2022-11-29 01:15:40,073] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:15:40,075] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:15:40,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:15:40,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,140] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,140] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,140] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,141] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,141] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,141] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,142] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,142] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,142] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 7: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:15:40,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,150] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,150] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,150] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,152] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,153] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 2: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,153] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 1: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,154] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,154] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,162] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,162] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,162] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,166] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:15:40,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:15:40,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 6: [2022-11-29 01:15:40,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 4: [2022-11-29 01:15:40,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:15:40,175] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:15:40,175] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,139] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,139] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,139] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,146] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,146] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,147] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,147] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,148] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,148] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,151] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,151] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,149] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,149] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 3: [2022-11-29 01:15:40,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:15:40,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 5: [2022-11-29 01:15:40,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:15:40,160] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:15:40,160] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:15:40,182] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,182] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: [2022-11-29 01:15:40,208] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step102000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:15:40,208] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step102000 is ready now! 0: successfully saved checkpoint at iteration 102000 to checkpoints_221m 7: time (ms) | save-checkpoint: 709.48 7: iteration 102010/ 115203 | consumed samples: 26114560 | consumed tokens: 53482618880 | elapsed time per iteration (s): 0.52 | learning rate: 2.588E-05 | global batch size: 256 | lm loss: 2.225317E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 489.046 | TFLOPs: 25.66 | 7: iteration 102020/ 115203 | consumed samples: 26117120 | consumed tokens: 53487861760 | elapsed time per iteration (s): 0.43 | learning rate: 2.587E-05 | global batch size: 256 | lm loss: 2.207828E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.875 | TFLOPs: 31.32 | 7: iteration 102030/ 115203 | consumed samples: 26119680 | consumed tokens: 53493104640 | elapsed time per iteration (s): 0.43 | learning rate: 2.586E-05 | global batch size: 256 | lm loss: 2.202259E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.412 | TFLOPs: 30.98 | 7: iteration 102040/ 115203 | consumed samples: 26122240 | consumed tokens: 53498347520 | elapsed time per iteration (s): 0.43 | learning rate: 2.585E-05 | global batch size: 256 | lm loss: 2.186283E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.936 | TFLOPs: 31.32 | 7: iteration 102050/ 115203 | consumed samples: 26124800 | consumed tokens: 53503590400 | elapsed time per iteration (s): 0.42 | learning rate: 2.584E-05 | global batch size: 256 | lm loss: 2.212614E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.452 | TFLOPs: 31.66 | 7: iteration 102060/ 115203 | consumed samples: 26127360 | consumed tokens: 53508833280 | elapsed time per iteration (s): 0.44 | learning rate: 2.583E-05 | global batch size: 256 | lm loss: 2.246114E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.530 | TFLOPs: 30.83 | 7: iteration 102070/ 115203 | consumed samples: 26129920 | consumed tokens: 53514076160 | elapsed time per iteration (s): 0.44 | learning rate: 2.583E-05 | global batch size: 256 | lm loss: 2.211121E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.941 | TFLOPs: 30.53 | 7: iteration 102080/ 115203 | consumed samples: 26132480 | consumed tokens: 53519319040 | elapsed time per iteration (s): 0.44 | learning rate: 2.582E-05 | global batch size: 256 | lm loss: 2.215886E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.358 | TFLOPs: 30.19 | 7: iteration 102090/ 115203 | consumed samples: 26135040 | consumed tokens: 53524561920 | elapsed time per iteration (s): 0.43 | learning rate: 2.581E-05 | global batch size: 256 | lm loss: 2.265123E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.664 | TFLOPs: 31.36 | 7: iteration 102100/ 115203 | consumed samples: 26137600 | consumed tokens: 53529804800 | elapsed time per iteration (s): 0.44 | learning rate: 2.580E-05 | global batch size: 256 | lm loss: 2.229800E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.097 | TFLOPs: 30.80 | 7: iteration 102110/ 115203 | consumed samples: 26140160 | consumed tokens: 53535047680 | elapsed time per iteration (s): 0.43 | learning rate: 2.579E-05 | global batch size: 256 | lm loss: 2.220026E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.994 | TFLOPs: 30.96 | 7: iteration 102120/ 115203 | consumed samples: 26142720 | consumed tokens: 53540290560 | elapsed time per iteration (s): 0.44 | learning rate: 2.578E-05 | global batch size: 256 | lm loss: 2.184691E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.280 | TFLOPs: 30.50 | 7: iteration 102130/ 115203 | consumed samples: 26145280 | consumed tokens: 53545533440 | elapsed time per iteration (s): 0.43 | learning rate: 2.577E-05 | global batch size: 256 | lm loss: 2.219685E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.717 | TFLOPs: 31.26 | 7: iteration 102140/ 115203 | consumed samples: 26147840 | consumed tokens: 53550776320 | elapsed time per iteration (s): 0.43 | learning rate: 2.576E-05 | global batch size: 256 | lm loss: 2.224642E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.040 | TFLOPs: 31.06 | 7: iteration 102150/ 115203 | consumed samples: 26150400 | consumed tokens: 53556019200 | elapsed time per iteration (s): 0.45 | learning rate: 2.576E-05 | global batch size: 256 | lm loss: 2.210136E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.836 | TFLOPs: 29.69 | 7: iteration 102160/ 115203 | consumed samples: 26152960 | consumed tokens: 53561262080 | elapsed time per iteration (s): 0.43 | learning rate: 2.575E-05 | global batch size: 256 | lm loss: 2.202858E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.653 | TFLOPs: 31.41 | 7: iteration 102170/ 115203 | consumed samples: 26155520 | consumed tokens: 53566504960 | elapsed time per iteration (s): 0.45 | learning rate: 2.574E-05 | global batch size: 256 | lm loss: 2.213839E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.609 | TFLOPs: 30.10 | 7: iteration 102180/ 115203 | consumed samples: 26158080 | consumed tokens: 53571747840 | elapsed time per iteration (s): 0.43 | learning rate: 2.573E-05 | global batch size: 256 | lm loss: 2.187406E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.780 | TFLOPs: 31.42 | 7: iteration 102190/ 115203 | consumed samples: 26160640 | consumed tokens: 53576990720 | elapsed time per iteration (s): 0.44 | learning rate: 2.572E-05 | global batch size: 256 | lm loss: 2.189767E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.194 | TFLOPs: 30.76 | 7: iteration 102200/ 115203 | consumed samples: 26163200 | consumed tokens: 53582233600 | elapsed time per iteration (s): 0.44 | learning rate: 2.571E-05 | global batch size: 256 | lm loss: 2.219891E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.457 | TFLOPs: 30.67 | 7: iteration 102210/ 115203 | consumed samples: 26165760 | consumed tokens: 53587476480 | elapsed time per iteration (s): 0.43 | learning rate: 2.570E-05 | global batch size: 256 | lm loss: 2.211024E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.209 | TFLOPs: 31.23 | 7: iteration 102220/ 115203 | consumed samples: 26168320 | consumed tokens: 53592719360 | elapsed time per iteration (s): 0.43 | learning rate: 2.569E-05 | global batch size: 256 | lm loss: 2.202343E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.057 | TFLOPs: 31.43 | 7: iteration 102230/ 115203 | consumed samples: 26170880 | consumed tokens: 53597962240 | elapsed time per iteration (s): 0.44 | learning rate: 2.569E-05 | global batch size: 256 | lm loss: 2.199265E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.234 | TFLOPs: 30.81 | 7: iteration 102240/ 115203 | consumed samples: 26173440 | consumed tokens: 53603205120 | elapsed time per iteration (s): 0.43 | learning rate: 2.568E-05 | global batch size: 256 | lm loss: 2.225081E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.805 | TFLOPs: 31.52 | 7: iteration 102250/ 115203 | consumed samples: 26176000 | consumed tokens: 53608448000 | elapsed time per iteration (s): 0.42 | learning rate: 2.567E-05 | global batch size: 256 | lm loss: 2.217893E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.102 | TFLOPs: 31.70 | 7: iteration 102260/ 115203 | consumed samples: 26178560 | consumed tokens: 53613690880 | elapsed time per iteration (s): 0.42 | learning rate: 2.566E-05 | global batch size: 256 | lm loss: 2.229613E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.866 | TFLOPs: 31.68 | 7: iteration 102270/ 115203 | consumed samples: 26181120 | consumed tokens: 53618933760 | elapsed time per iteration (s): 0.44 | learning rate: 2.565E-05 | global batch size: 256 | lm loss: 2.216448E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.734 | TFLOPs: 30.47 | 7: iteration 102280/ 115203 | consumed samples: 26183680 | consumed tokens: 53624176640 | elapsed time per iteration (s): 0.45 | learning rate: 2.564E-05 | global batch size: 256 | lm loss: 2.237630E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.705 | TFLOPs: 30.10 | 7: iteration 102290/ 115203 | consumed samples: 26186240 | consumed tokens: 53629419520 | elapsed time per iteration (s): 0.44 | learning rate: 2.563E-05 | global batch size: 256 | lm loss: 2.210475E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.375 | TFLOPs: 30.24 | 7: iteration 102300/ 115203 | consumed samples: 26188800 | consumed tokens: 53634662400 | elapsed time per iteration (s): 0.42 | learning rate: 2.563E-05 | global batch size: 256 | lm loss: 2.226715E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.226 | TFLOPs: 31.70 | 7: iteration 102310/ 115203 | consumed samples: 26191360 | consumed tokens: 53639905280 | elapsed time per iteration (s): 0.43 | learning rate: 2.562E-05 | global batch size: 256 | lm loss: 2.204303E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.066 | TFLOPs: 31.01 | 7: iteration 102320/ 115203 | consumed samples: 26193920 | consumed tokens: 53645148160 | elapsed time per iteration (s): 0.43 | learning rate: 2.561E-05 | global batch size: 256 | lm loss: 2.196224E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.593 | TFLOPs: 31.46 | 7: iteration 102330/ 115203 | consumed samples: 26196480 | consumed tokens: 53650391040 | elapsed time per iteration (s): 0.44 | learning rate: 2.560E-05 | global batch size: 256 | lm loss: 2.217180E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.483 | TFLOPs: 30.30 | 7: iteration 102340/ 115203 | consumed samples: 26199040 | consumed tokens: 53655633920 | elapsed time per iteration (s): 0.45 | learning rate: 2.559E-05 | global batch size: 256 | lm loss: 2.262529E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.763 | TFLOPs: 30.16 | 7: iteration 102350/ 115203 | consumed samples: 26201600 | consumed tokens: 53660876800 | elapsed time per iteration (s): 0.43 | learning rate: 2.558E-05 | global batch size: 256 | lm loss: 2.226149E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.057 | TFLOPs: 31.12 | 7: iteration 102360/ 115203 | consumed samples: 26204160 | consumed tokens: 53666119680 | elapsed time per iteration (s): 0.42 | learning rate: 2.557E-05 | global batch size: 256 | lm loss: 2.233847E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.014 | TFLOPs: 32.01 | 7: iteration 102370/ 115203 | consumed samples: 26206720 | consumed tokens: 53671362560 | elapsed time per iteration (s): 0.44 | learning rate: 2.557E-05 | global batch size: 256 | lm loss: 2.220407E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.461 | TFLOPs: 30.51 | 7: iteration 102380/ 115203 | consumed samples: 26209280 | consumed tokens: 53676605440 | elapsed time per iteration (s): 0.43 | learning rate: 2.556E-05 | global batch size: 256 | lm loss: 2.219777E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.447 | TFLOPs: 31.29 | 7: iteration 102390/ 115203 | consumed samples: 26211840 | consumed tokens: 53681848320 | elapsed time per iteration (s): 0.43 | learning rate: 2.555E-05 | global batch size: 256 | lm loss: 2.233701E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.806 | TFLOPs: 31.16 | 7: iteration 102400/ 115203 | consumed samples: 26214400 | consumed tokens: 53687091200 | elapsed time per iteration (s): 0.44 | learning rate: 2.554E-05 | global batch size: 256 | lm loss: 2.200447E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.142 | TFLOPs: 30.65 | 7: iteration 102410/ 115203 | consumed samples: 26216960 | consumed tokens: 53692334080 | elapsed time per iteration (s): 0.46 | learning rate: 2.553E-05 | global batch size: 256 | lm loss: 2.207720E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.494 | TFLOPs: 29.20 | 7: iteration 102420/ 115203 | consumed samples: 26219520 | consumed tokens: 53697576960 | elapsed time per iteration (s): 0.45 | learning rate: 2.552E-05 | global batch size: 256 | lm loss: 2.242937E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.221 | TFLOPs: 30.02 | 7: iteration 102430/ 115203 | consumed samples: 26222080 | consumed tokens: 53702819840 | elapsed time per iteration (s): 0.44 | learning rate: 2.551E-05 | global batch size: 256 | lm loss: 2.188692E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.595 | TFLOPs: 30.36 | 7: iteration 102440/ 115203 | consumed samples: 26224640 | consumed tokens: 53708062720 | elapsed time per iteration (s): 0.43 | learning rate: 2.551E-05 | global batch size: 256 | lm loss: 2.235433E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.415 | TFLOPs: 31.08 | 7: iteration 102450/ 115203 | consumed samples: 26227200 | consumed tokens: 53713305600 | elapsed time per iteration (s): 0.43 | learning rate: 2.550E-05 | global batch size: 256 | lm loss: 2.218521E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.174 | TFLOPs: 31.23 | 7: iteration 102460/ 115203 | consumed samples: 26229760 | consumed tokens: 53718548480 | elapsed time per iteration (s): 0.43 | learning rate: 2.549E-05 | global batch size: 256 | lm loss: 2.211736E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.936 | TFLOPs: 31.22 | 7: iteration 102470/ 115203 | consumed samples: 26232320 | consumed tokens: 53723791360 | elapsed time per iteration (s): 0.43 | learning rate: 2.548E-05 | global batch size: 256 | lm loss: 2.214333E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.638 | TFLOPs: 31.51 | 7: iteration 102480/ 115203 | consumed samples: 26234880 | consumed tokens: 53729034240 | elapsed time per iteration (s): 0.45 | learning rate: 2.547E-05 | global batch size: 256 | lm loss: 2.266576E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.889 | TFLOPs: 30.11 | 7: iteration 102490/ 115203 | consumed samples: 26237440 | consumed tokens: 53734277120 | elapsed time per iteration (s): 0.45 | learning rate: 2.546E-05 | global batch size: 256 | lm loss: 2.217091E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.800 | TFLOPs: 29.69 | 7: iteration 102500/ 115203 | consumed samples: 26240000 | consumed tokens: 53739520000 | elapsed time per iteration (s): 0.44 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 2.225850E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.470 | TFLOPs: 30.56 | 7: iteration 102510/ 115203 | consumed samples: 26242560 | consumed tokens: 53744762880 | elapsed time per iteration (s): 0.43 | learning rate: 2.545E-05 | global batch size: 256 | lm loss: 2.227542E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.217 | TFLOPs: 31.49 | 7: iteration 102520/ 115203 | consumed samples: 26245120 | consumed tokens: 53750005760 | elapsed time per iteration (s): 0.43 | learning rate: 2.544E-05 | global batch size: 256 | lm loss: 2.235853E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.769 | TFLOPs: 30.94 | 7: iteration 102530/ 115203 | consumed samples: 26247680 | consumed tokens: 53755248640 | elapsed time per iteration (s): 0.43 | learning rate: 2.543E-05 | global batch size: 256 | lm loss: 2.213895E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.264 | TFLOPs: 31.23 | 7: iteration 102540/ 115203 | consumed samples: 26250240 | consumed tokens: 53760491520 | elapsed time per iteration (s): 0.43 | learning rate: 2.542E-05 | global batch size: 256 | lm loss: 2.252085E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.457 | TFLOPs: 31.30 | 7: iteration 102550/ 115203 | consumed samples: 26252800 | consumed tokens: 53765734400 | elapsed time per iteration (s): 0.43 | learning rate: 2.541E-05 | global batch size: 256 | lm loss: 2.220520E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.497 | TFLOPs: 31.03 | 7: iteration 102560/ 115203 | consumed samples: 26255360 | consumed tokens: 53770977280 | elapsed time per iteration (s): 0.42 | learning rate: 2.540E-05 | global batch size: 256 | lm loss: 2.229739E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.315 | TFLOPs: 31.71 | 7: iteration 102570/ 115203 | consumed samples: 26257920 | consumed tokens: 53776220160 | elapsed time per iteration (s): 0.44 | learning rate: 2.540E-05 | global batch size: 256 | lm loss: 2.241173E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.153 | TFLOPs: 30.60 | 7: iteration 102580/ 115203 | consumed samples: 26260480 | consumed tokens: 53781463040 | elapsed time per iteration (s): 0.43 | learning rate: 2.539E-05 | global batch size: 256 | lm loss: 2.202188E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.378 | TFLOPs: 31.50 | 7: iteration 102590/ 115203 | consumed samples: 26263040 | consumed tokens: 53786705920 | elapsed time per iteration (s): 0.44 | learning rate: 2.538E-05 | global batch size: 256 | lm loss: 2.218407E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.280 | TFLOPs: 30.71 | 7: iteration 102600/ 115203 | consumed samples: 26265600 | consumed tokens: 53791948800 | elapsed time per iteration (s): 0.43 | learning rate: 2.537E-05 | global batch size: 256 | lm loss: 2.209198E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.629 | TFLOPs: 30.99 | 7: iteration 102610/ 115203 | consumed samples: 26268160 | consumed tokens: 53797191680 | elapsed time per iteration (s): 0.44 | learning rate: 2.536E-05 | global batch size: 256 | lm loss: 2.232547E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.549 | TFLOPs: 30.78 | 7: iteration 102620/ 115203 | consumed samples: 26270720 | consumed tokens: 53802434560 | elapsed time per iteration (s): 0.43 | learning rate: 2.535E-05 | global batch size: 256 | lm loss: 2.225487E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.250 | TFLOPs: 30.97 | 7: iteration 102630/ 115203 | consumed samples: 26273280 | consumed tokens: 53807677440 | elapsed time per iteration (s): 0.45 | learning rate: 2.534E-05 | global batch size: 256 | lm loss: 2.214260E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.957 | TFLOPs: 29.80 | 7: iteration 102640/ 115203 | consumed samples: 26275840 | consumed tokens: 53812920320 | elapsed time per iteration (s): 0.43 | learning rate: 2.534E-05 | global batch size: 256 | lm loss: 2.232826E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.571 | TFLOPs: 31.09 | 7: iteration 102650/ 115203 | consumed samples: 26278400 | consumed tokens: 53818163200 | elapsed time per iteration (s): 0.43 | learning rate: 2.533E-05 | global batch size: 256 | lm loss: 2.210437E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.167 | TFLOPs: 30.91 | 7: iteration 102660/ 115203 | consumed samples: 26280960 | consumed tokens: 53823406080 | elapsed time per iteration (s): 0.45 | learning rate: 2.532E-05 | global batch size: 256 | lm loss: 2.219423E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.559 | TFLOPs: 29.88 | 7: iteration 102670/ 115203 | consumed samples: 26283520 | consumed tokens: 53828648960 | elapsed time per iteration (s): 0.43 | learning rate: 2.531E-05 | global batch size: 256 | lm loss: 2.216312E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.399 | TFLOPs: 31.08 | 7: iteration 102680/ 115203 | consumed samples: 26286080 | consumed tokens: 53833891840 | elapsed time per iteration (s): 0.44 | learning rate: 2.530E-05 | global batch size: 256 | lm loss: 2.227842E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.867 | TFLOPs: 30.32 | 7: iteration 102690/ 115203 | consumed samples: 26288640 | consumed tokens: 53839134720 | elapsed time per iteration (s): 0.44 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 2.252200E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.385 | TFLOPs: 30.87 | 7: iteration 102700/ 115203 | consumed samples: 26291200 | consumed tokens: 53844377600 | elapsed time per iteration (s): 0.43 | learning rate: 2.529E-05 | global batch size: 256 | lm loss: 2.218608E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.153 | TFLOPs: 31.28 | 7: iteration 102710/ 115203 | consumed samples: 26293760 | consumed tokens: 53849620480 | elapsed time per iteration (s): 0.43 | learning rate: 2.528E-05 | global batch size: 256 | lm loss: 2.225038E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.493 | TFLOPs: 31.03 | 7: iteration 102720/ 115203 | consumed samples: 26296320 | consumed tokens: 53854863360 | elapsed time per iteration (s): 0.44 | learning rate: 2.527E-05 | global batch size: 256 | lm loss: 2.237705E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.466 | TFLOPs: 30.82 | 7: iteration 102730/ 115203 | consumed samples: 26298880 | consumed tokens: 53860106240 | elapsed time per iteration (s): 0.43 | learning rate: 2.526E-05 | global batch size: 256 | lm loss: 2.216459E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.816 | TFLOPs: 31.05 | 7: iteration 102740/ 115203 | consumed samples: 26301440 | consumed tokens: 53865349120 | elapsed time per iteration (s): 0.44 | learning rate: 2.525E-05 | global batch size: 256 | lm loss: 2.188828E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.882 | TFLOPs: 30.32 | 7: iteration 102750/ 115203 | consumed samples: 26304000 | consumed tokens: 53870592000 | elapsed time per iteration (s): 0.42 | learning rate: 2.524E-05 | global batch size: 256 | lm loss: 2.244206E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.092 | TFLOPs: 31.91 | 7: iteration 102760/ 115203 | consumed samples: 26306560 | consumed tokens: 53875834880 | elapsed time per iteration (s): 0.43 | learning rate: 2.524E-05 | global batch size: 256 | lm loss: 2.241430E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.960 | TFLOPs: 31.16 | 7: iteration 102770/ 115203 | consumed samples: 26309120 | consumed tokens: 53881077760 | elapsed time per iteration (s): 0.43 | learning rate: 2.523E-05 | global batch size: 256 | lm loss: 2.228651E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.460 | TFLOPs: 30.98 | 7: iteration 102780/ 115203 | consumed samples: 26311680 | consumed tokens: 53886320640 | elapsed time per iteration (s): 0.43 | learning rate: 2.522E-05 | global batch size: 256 | lm loss: 2.199644E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.899 | TFLOPs: 31.42 | 7: iteration 102790/ 115203 | consumed samples: 26314240 | consumed tokens: 53891563520 | elapsed time per iteration (s): 0.43 | learning rate: 2.521E-05 | global batch size: 256 | lm loss: 2.249150E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.042 | TFLOPs: 31.38 | 7: iteration 102800/ 115203 | consumed samples: 26316800 | consumed tokens: 53896806400 | elapsed time per iteration (s): 0.46 | learning rate: 2.520E-05 | global batch size: 256 | lm loss: 2.250535E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.851 | TFLOPs: 29.48 | 7: iteration 102810/ 115203 | consumed samples: 26319360 | consumed tokens: 53902049280 | elapsed time per iteration (s): 0.44 | learning rate: 2.519E-05 | global batch size: 256 | lm loss: 2.248709E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.540 | TFLOPs: 30.77 | 7: iteration 102820/ 115203 | consumed samples: 26321920 | consumed tokens: 53907292160 | elapsed time per iteration (s): 0.43 | learning rate: 2.519E-05 | global batch size: 256 | lm loss: 2.228461E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.794 | TFLOPs: 31.05 | 7: iteration 102830/ 115203 | consumed samples: 26324480 | consumed tokens: 53912535040 | elapsed time per iteration (s): 0.43 | learning rate: 2.518E-05 | global batch size: 256 | lm loss: 2.211824E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.707 | TFLOPs: 31.41 | 7: iteration 102840/ 115203 | consumed samples: 26327040 | consumed tokens: 53917777920 | elapsed time per iteration (s): 0.43 | learning rate: 2.517E-05 | global batch size: 256 | lm loss: 2.264476E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.356 | TFLOPs: 30.92 | 7: iteration 102850/ 115203 | consumed samples: 26329600 | consumed tokens: 53923020800 | elapsed time per iteration (s): 0.46 | learning rate: 2.516E-05 | global batch size: 256 | lm loss: 2.219946E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.989 | TFLOPs: 29.07 | 7: iteration 102860/ 115203 | consumed samples: 26332160 | consumed tokens: 53928263680 | elapsed time per iteration (s): 0.43 | learning rate: 2.515E-05 | global batch size: 256 | lm loss: 2.218637E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.089 | TFLOPs: 31.38 | 7: iteration 102870/ 115203 | consumed samples: 26334720 | consumed tokens: 53933506560 | elapsed time per iteration (s): 0.43 | learning rate: 2.514E-05 | global batch size: 256 | lm loss: 2.224267E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.395 | TFLOPs: 31.45 | 7: iteration 102880/ 115203 | consumed samples: 26337280 | consumed tokens: 53938749440 | elapsed time per iteration (s): 0.44 | learning rate: 2.514E-05 | global batch size: 256 | lm loss: 2.210003E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.358 | TFLOPs: 30.66 | 7: iteration 102890/ 115203 | consumed samples: 26339840 | consumed tokens: 53943992320 | elapsed time per iteration (s): 0.45 | learning rate: 2.513E-05 | global batch size: 256 | lm loss: 2.237546E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.375 | TFLOPs: 29.98 | 7: iteration 102900/ 115203 | consumed samples: 26342400 | consumed tokens: 53949235200 | elapsed time per iteration (s): 0.43 | learning rate: 2.512E-05 | global batch size: 256 | lm loss: 2.217761E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.823 | TFLOPs: 31.05 | 7: iteration 102910/ 115203 | consumed samples: 26344960 | consumed tokens: 53954478080 | elapsed time per iteration (s): 0.45 | learning rate: 2.511E-05 | global batch size: 256 | lm loss: 2.215828E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.712 | TFLOPs: 30.10 | 7: iteration 102920/ 115203 | consumed samples: 26347520 | consumed tokens: 53959720960 | elapsed time per iteration (s): 0.43 | learning rate: 2.510E-05 | global batch size: 256 | lm loss: 2.251167E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.149 | TFLOPs: 31.07 | 7: iteration 102930/ 115203 | consumed samples: 26350080 | consumed tokens: 53964963840 | elapsed time per iteration (s): 0.43 | learning rate: 2.509E-05 | global batch size: 256 | lm loss: 2.229059E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.664 | TFLOPs: 31.04 | 7: iteration 102940/ 115203 | consumed samples: 26352640 | consumed tokens: 53970206720 | elapsed time per iteration (s): 0.44 | learning rate: 2.509E-05 | global batch size: 256 | lm loss: 2.233409E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.013 | TFLOPs: 30.75 | 7: iteration 102950/ 115203 | consumed samples: 26355200 | consumed tokens: 53975449600 | elapsed time per iteration (s): 0.61 | learning rate: 2.508E-05 | global batch size: 256 | lm loss: 2.214703E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 418.247 | TFLOPs: 21.94 | 7: iteration 102960/ 115203 | consumed samples: 26357760 | consumed tokens: 53980692480 | elapsed time per iteration (s): 0.47 | learning rate: 2.507E-05 | global batch size: 256 | lm loss: 2.226229E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 540.654 | TFLOPs: 28.37 | 7: iteration 102970/ 115203 | consumed samples: 26360320 | consumed tokens: 53985935360 | elapsed time per iteration (s): 0.43 | learning rate: 2.506E-05 | global batch size: 256 | lm loss: 2.199430E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.597 | TFLOPs: 31.25 | 7: iteration 102980/ 115203 | consumed samples: 26362880 | consumed tokens: 53991178240 | elapsed time per iteration (s): 0.43 | learning rate: 2.505E-05 | global batch size: 256 | lm loss: 2.224549E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.323 | TFLOPs: 31.24 | 7: iteration 102990/ 115203 | consumed samples: 26365440 | consumed tokens: 53996421120 | elapsed time per iteration (s): 0.44 | learning rate: 2.505E-05 | global batch size: 256 | lm loss: 2.194539E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.656 | TFLOPs: 30.62 | 7: iteration 103000/ 115203 | consumed samples: 26368000 | consumed tokens: 54001664000 | elapsed time per iteration (s): 0.43 | learning rate: 2.504E-05 | global batch size: 256 | lm loss: 2.235566E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.353 | TFLOPs: 30.92 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 103000 | lm loss value: 2.200675E+00 | lm loss PPL: 9.031103E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 103000 to checkpoints_221m 0: [2022-11-29 01:22:57,739] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step103000 is begin to save! 0: [2022-11-29 01:22:57,756] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:22:57,909] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:22:57,909] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:22:57,931] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:22:57,932] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:22:57,956] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:22:57,956] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:22:57,980] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:22:57,980] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:22:58,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:22:58,005] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:22:58,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:22:58,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:22:58,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:22:58,055] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:22:58,079] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:22:58,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:22:58,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:22:58,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:22:58,127] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:22:58,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:22:58,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:22:58,152] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:22:58,175] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:22:58,175] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:22:58,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:22:58,200] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:22:58,224] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:22:58,224] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:22:58,248] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:22:58,248] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:22:58,272] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:22:58,273] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:22:58,296] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:22:58,297] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:22:58,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:22:58,330] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:22:58,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:22:58,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:22:58,359] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:22:58,360] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step103000/mp_rank_00_model_states.pt 0: [2022-11-29 01:22:58,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:22:58,362] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:22:58,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:22:58,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,436] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,436] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 2: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,439] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 2: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,439] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,440] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,440] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,440] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,444] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,444] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 1: [2022-11-29 01:22:58,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:22:58,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:22:58,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:22:58,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,447] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 7: [2022-11-29 01:22:58,447] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:22:58,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 5: [2022-11-29 01:22:58,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,448] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:22:58,449] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,449] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:22:58,453] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:22:58,453] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,457] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 4: [2022-11-29 01:22:58,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:22:58,457] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:22:58,458] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: [2022-11-29 01:22:58,506] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:22:58,506] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,609] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 3: [2022-11-29 01:22:58,610] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:22:58,610] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step103000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 01:22:58,610] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step103000 is ready now! 0: successfully saved checkpoint at iteration 103000 to checkpoints_221m 7: time (ms) | save-checkpoint: 895.32 7: iteration 103010/ 115203 | consumed samples: 26370560 | consumed tokens: 54006906880 | elapsed time per iteration (s): 0.56 | learning rate: 2.503E-05 | global batch size: 256 | lm loss: 2.229642E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 455.632 | TFLOPs: 23.91 | 7: iteration 103020/ 115203 | consumed samples: 26373120 | consumed tokens: 54012149760 | elapsed time per iteration (s): 0.46 | learning rate: 2.502E-05 | global batch size: 256 | lm loss: 2.225499E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 557.465 | TFLOPs: 29.25 | 7: iteration 103030/ 115203 | consumed samples: 26375680 | consumed tokens: 54017392640 | elapsed time per iteration (s): 0.44 | learning rate: 2.501E-05 | global batch size: 256 | lm loss: 2.204845E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.670 | TFLOPs: 30.52 | 7: iteration 103040/ 115203 | consumed samples: 26378240 | consumed tokens: 54022635520 | elapsed time per iteration (s): 0.46 | learning rate: 2.500E-05 | global batch size: 256 | lm loss: 2.209826E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.048 | TFLOPs: 29.38 | 7: iteration 103050/ 115203 | consumed samples: 26380800 | consumed tokens: 54027878400 | elapsed time per iteration (s): 0.43 | learning rate: 2.500E-05 | global batch size: 256 | lm loss: 2.226431E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.580 | TFLOPs: 31.04 | 7: iteration 103060/ 115203 | consumed samples: 26383360 | consumed tokens: 54033121280 | elapsed time per iteration (s): 0.44 | learning rate: 2.499E-05 | global batch size: 256 | lm loss: 2.221294E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.541 | TFLOPs: 30.72 | 7: iteration 103070/ 115203 | consumed samples: 26385920 | consumed tokens: 54038364160 | elapsed time per iteration (s): 0.44 | learning rate: 2.498E-05 | global batch size: 256 | lm loss: 2.220614E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.600 | TFLOPs: 30.52 | 7: iteration 103080/ 115203 | consumed samples: 26388480 | consumed tokens: 54043607040 | elapsed time per iteration (s): 0.44 | learning rate: 2.497E-05 | global batch size: 256 | lm loss: 2.258218E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.614 | TFLOPs: 30.57 | 7: iteration 103090/ 115203 | consumed samples: 26391040 | consumed tokens: 54048849920 | elapsed time per iteration (s): 0.43 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 2.230506E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.535 | TFLOPs: 31.51 | 7: iteration 103100/ 115203 | consumed samples: 26393600 | consumed tokens: 54054092800 | elapsed time per iteration (s): 0.43 | learning rate: 2.496E-05 | global batch size: 256 | lm loss: 2.261462E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.253 | TFLOPs: 31.55 | 7: iteration 103110/ 115203 | consumed samples: 26396160 | consumed tokens: 54059335680 | elapsed time per iteration (s): 0.44 | learning rate: 2.495E-05 | global batch size: 256 | lm loss: 2.227263E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.674 | TFLOPs: 30.57 | 7: iteration 103120/ 115203 | consumed samples: 26398720 | consumed tokens: 54064578560 | elapsed time per iteration (s): 0.43 | learning rate: 2.494E-05 | global batch size: 256 | lm loss: 2.242833E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.761 | TFLOPs: 30.94 | 7: iteration 103130/ 115203 | consumed samples: 26401280 | consumed tokens: 54069821440 | elapsed time per iteration (s): 0.42 | learning rate: 2.493E-05 | global batch size: 256 | lm loss: 2.235414E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.479 | TFLOPs: 31.66 | 7: iteration 103140/ 115203 | consumed samples: 26403840 | consumed tokens: 54075064320 | elapsed time per iteration (s): 0.43 | learning rate: 2.492E-05 | global batch size: 256 | lm loss: 2.257329E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.891 | TFLOPs: 31.06 | 7: iteration 103150/ 115203 | consumed samples: 26406400 | consumed tokens: 54080307200 | elapsed time per iteration (s): 0.45 | learning rate: 2.492E-05 | global batch size: 256 | lm loss: 2.198170E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.253 | TFLOPs: 30.18 | 7: iteration 103160/ 115203 | consumed samples: 26408960 | consumed tokens: 54085550080 | elapsed time per iteration (s): 0.43 | learning rate: 2.491E-05 | global batch size: 256 | lm loss: 2.235010E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.098 | TFLOPs: 31.07 | 7: iteration 103170/ 115203 | consumed samples: 26411520 | consumed tokens: 54090792960 | elapsed time per iteration (s): 0.43 | learning rate: 2.490E-05 | global batch size: 256 | lm loss: 2.197989E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.340 | TFLOPs: 31.03 | 7: iteration 103180/ 115203 | consumed samples: 26414080 | consumed tokens: 54096035840 | elapsed time per iteration (s): 0.44 | learning rate: 2.489E-05 | global batch size: 256 | lm loss: 2.232523E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.377 | TFLOPs: 30.40 | 7: iteration 103190/ 115203 | consumed samples: 26416640 | consumed tokens: 54101278720 | elapsed time per iteration (s): 0.43 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 2.227040E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.937 | TFLOPs: 31.58 | 7: iteration 103200/ 115203 | consumed samples: 26419200 | consumed tokens: 54106521600 | elapsed time per iteration (s): 0.45 | learning rate: 2.488E-05 | global batch size: 256 | lm loss: 2.217889E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.174 | TFLOPs: 30.07 | 7: iteration 103210/ 115203 | consumed samples: 26421760 | consumed tokens: 54111764480 | elapsed time per iteration (s): 0.43 | learning rate: 2.487E-05 | global batch size: 256 | lm loss: 2.247693E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.787 | TFLOPs: 30.89 | 7: iteration 103220/ 115203 | consumed samples: 26424320 | consumed tokens: 54117007360 | elapsed time per iteration (s): 0.44 | learning rate: 2.486E-05 | global batch size: 256 | lm loss: 2.211893E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.530 | TFLOPs: 30.77 | 7: iteration 103230/ 115203 | consumed samples: 26426880 | consumed tokens: 54122250240 | elapsed time per iteration (s): 0.43 | learning rate: 2.485E-05 | global batch size: 256 | lm loss: 2.207026E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.486 | TFLOPs: 31.24 | 7: iteration 103240/ 115203 | consumed samples: 26429440 | consumed tokens: 54127493120 | elapsed time per iteration (s): 0.45 | learning rate: 2.484E-05 | global batch size: 256 | lm loss: 2.232129E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.888 | TFLOPs: 30.06 | 7: iteration 103250/ 115203 | consumed samples: 26432000 | consumed tokens: 54132736000 | elapsed time per iteration (s): 0.45 | learning rate: 2.484E-05 | global batch size: 256 | lm loss: 2.231633E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.311 | TFLOPs: 29.87 | 7: iteration 103260/ 115203 | consumed samples: 26434560 | consumed tokens: 54137978880 | elapsed time per iteration (s): 0.43 | learning rate: 2.483E-05 | global batch size: 256 | lm loss: 2.232345E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.289 | TFLOPs: 31.39 | 7: iteration 103270/ 115203 | consumed samples: 26437120 | consumed tokens: 54143221760 | elapsed time per iteration (s): 0.43 | learning rate: 2.482E-05 | global batch size: 256 | lm loss: 2.225402E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.004 | TFLOPs: 31.43 | 7: iteration 103280/ 115203 | consumed samples: 26439680 | consumed tokens: 54148464640 | elapsed time per iteration (s): 0.43 | learning rate: 2.481E-05 | global batch size: 256 | lm loss: 2.245013E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.648 | TFLOPs: 31.20 | 7: iteration 103290/ 115203 | consumed samples: 26442240 | consumed tokens: 54153707520 | elapsed time per iteration (s): 0.43 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 2.199839E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.766 | TFLOPs: 31.42 | 7: iteration 103300/ 115203 | consumed samples: 26444800 | consumed tokens: 54158950400 | elapsed time per iteration (s): 0.43 | learning rate: 2.480E-05 | global batch size: 256 | lm loss: 2.228868E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.238 | TFLOPs: 31.28 | 7: iteration 103310/ 115203 | consumed samples: 26447360 | consumed tokens: 54164193280 | elapsed time per iteration (s): 0.46 | learning rate: 2.479E-05 | global batch size: 256 | lm loss: 2.216849E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.919 | TFLOPs: 29.43 | 7: iteration 103320/ 115203 | consumed samples: 26449920 | consumed tokens: 54169436160 | elapsed time per iteration (s): 0.44 | learning rate: 2.478E-05 | global batch size: 256 | lm loss: 2.206342E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.917 | TFLOPs: 30.53 | 7: iteration 103330/ 115203 | consumed samples: 26452480 | consumed tokens: 54174679040 | elapsed time per iteration (s): 0.43 | learning rate: 2.477E-05 | global batch size: 256 | lm loss: 2.223587E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.112 | TFLOPs: 31.17 | 7: iteration 103340/ 115203 | consumed samples: 26455040 | consumed tokens: 54179921920 | elapsed time per iteration (s): 0.43 | learning rate: 2.476E-05 | global batch size: 256 | lm loss: 2.241825E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.678 | TFLOPs: 31.15 | 7: iteration 103350/ 115203 | consumed samples: 26457600 | consumed tokens: 54185164800 | elapsed time per iteration (s): 0.44 | learning rate: 2.476E-05 | global batch size: 256 | lm loss: 2.222558E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.999 | TFLOPs: 30.54 | 7: iteration 103360/ 115203 | consumed samples: 26460160 | consumed tokens: 54190407680 | elapsed time per iteration (s): 0.43 | learning rate: 2.475E-05 | global batch size: 256 | lm loss: 2.228313E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.504 | TFLOPs: 31.09 | 7: iteration 103370/ 115203 | consumed samples: 26462720 | consumed tokens: 54195650560 | elapsed time per iteration (s): 0.43 | learning rate: 2.474E-05 | global batch size: 256 | lm loss: 2.241979E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.345 | TFLOPs: 31.08 | 7: iteration 103380/ 115203 | consumed samples: 26465280 | consumed tokens: 54200893440 | elapsed time per iteration (s): 0.43 | learning rate: 2.473E-05 | global batch size: 256 | lm loss: 2.214307E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.955 | TFLOPs: 31.58 | 7: iteration 103390/ 115203 | consumed samples: 26467840 | consumed tokens: 54206136320 | elapsed time per iteration (s): 0.43 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 2.257443E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.447 | TFLOPs: 31.40 | 7: iteration 103400/ 115203 | consumed samples: 26470400 | consumed tokens: 54211379200 | elapsed time per iteration (s): 0.43 | learning rate: 2.472E-05 | global batch size: 256 | lm loss: 2.237969E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.421 | TFLOPs: 31.24 | 7: iteration 103410/ 115203 | consumed samples: 26472960 | consumed tokens: 54216622080 | elapsed time per iteration (s): 0.43 | learning rate: 2.471E-05 | global batch size: 256 | lm loss: 2.237501E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.277 | TFLOPs: 31.13 | 7: iteration 103420/ 115203 | consumed samples: 26475520 | consumed tokens: 54221864960 | elapsed time per iteration (s): 0.44 | learning rate: 2.470E-05 | global batch size: 256 | lm loss: 2.208187E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.170 | TFLOPs: 30.49 | 7: iteration 103430/ 115203 | consumed samples: 26478080 | consumed tokens: 54227107840 | elapsed time per iteration (s): 0.44 | learning rate: 2.469E-05 | global batch size: 256 | lm loss: 2.207557E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.886 | TFLOPs: 30.64 | 7: iteration 103440/ 115203 | consumed samples: 26480640 | consumed tokens: 54232350720 | elapsed time per iteration (s): 0.43 | learning rate: 2.468E-05 | global batch size: 256 | lm loss: 2.233739E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.263 | TFLOPs: 31.23 | 7: iteration 103450/ 115203 | consumed samples: 26483200 | consumed tokens: 54237593600 | elapsed time per iteration (s): 0.44 | learning rate: 2.468E-05 | global batch size: 256 | lm loss: 2.232898E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.841 | TFLOPs: 30.79 | 7: iteration 103460/ 115203 | consumed samples: 26485760 | consumed tokens: 54242836480 | elapsed time per iteration (s): 0.43 | learning rate: 2.467E-05 | global batch size: 256 | lm loss: 2.237239E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.308 | TFLOPs: 31.29 | 7: iteration 103470/ 115203 | consumed samples: 26488320 | consumed tokens: 54248079360 | elapsed time per iteration (s): 0.43 | learning rate: 2.466E-05 | global batch size: 256 | lm loss: 2.213541E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.578 | TFLOPs: 30.99 | 7: iteration 103480/ 115203 | consumed samples: 26490880 | consumed tokens: 54253322240 | elapsed time per iteration (s): 0.43 | learning rate: 2.465E-05 | global batch size: 256 | lm loss: 2.222070E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.528 | TFLOPs: 31.46 | 7: iteration 103490/ 115203 | consumed samples: 26493440 | consumed tokens: 54258565120 | elapsed time per iteration (s): 0.43 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 2.207463E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.517 | TFLOPs: 31.51 | 7: iteration 103500/ 115203 | consumed samples: 26496000 | consumed tokens: 54263808000 | elapsed time per iteration (s): 0.43 | learning rate: 2.464E-05 | global batch size: 256 | lm loss: 2.256494E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.111 | TFLOPs: 31.22 | 7: iteration 103510/ 115203 | consumed samples: 26498560 | consumed tokens: 54269050880 | elapsed time per iteration (s): 0.42 | learning rate: 2.463E-05 | global batch size: 256 | lm loss: 2.214414E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.119 | TFLOPs: 31.75 | 7: iteration 103520/ 115203 | consumed samples: 26501120 | consumed tokens: 54274293760 | elapsed time per iteration (s): 0.43 | learning rate: 2.462E-05 | global batch size: 256 | lm loss: 2.243692E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.283 | TFLOPs: 31.60 | 7: iteration 103530/ 115203 | consumed samples: 26503680 | consumed tokens: 54279536640 | elapsed time per iteration (s): 0.42 | learning rate: 2.461E-05 | global batch size: 256 | lm loss: 2.229099E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.641 | TFLOPs: 32.04 | 7: iteration 103540/ 115203 | consumed samples: 26506240 | consumed tokens: 54284779520 | elapsed time per iteration (s): 0.43 | learning rate: 2.461E-05 | global batch size: 256 | lm loss: 2.209169E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.828 | TFLOPs: 31.21 | 7: iteration 103550/ 115203 | consumed samples: 26508800 | consumed tokens: 54290022400 | elapsed time per iteration (s): 0.42 | learning rate: 2.460E-05 | global batch size: 256 | lm loss: 2.204430E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.707 | TFLOPs: 31.83 | 7: iteration 103560/ 115203 | consumed samples: 26511360 | consumed tokens: 54295265280 | elapsed time per iteration (s): 0.43 | learning rate: 2.459E-05 | global batch size: 256 | lm loss: 2.210263E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.637 | TFLOPs: 31.46 | 7: iteration 103570/ 115203 | consumed samples: 26513920 | consumed tokens: 54300508160 | elapsed time per iteration (s): 0.44 | learning rate: 2.458E-05 | global batch size: 256 | lm loss: 2.228149E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.272 | TFLOPs: 30.60 | 7: iteration 103580/ 115203 | consumed samples: 26516480 | consumed tokens: 54305751040 | elapsed time per iteration (s): 0.42 | learning rate: 2.457E-05 | global batch size: 256 | lm loss: 2.247262E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.510 | TFLOPs: 31.82 | 7: iteration 103590/ 115203 | consumed samples: 26519040 | consumed tokens: 54310993920 | elapsed time per iteration (s): 0.43 | learning rate: 2.457E-05 | global batch size: 256 | lm loss: 2.219036E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.684 | TFLOPs: 31.15 | 7: iteration 103600/ 115203 | consumed samples: 26521600 | consumed tokens: 54316236800 | elapsed time per iteration (s): 0.43 | learning rate: 2.456E-05 | global batch size: 256 | lm loss: 2.264776E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.729 | TFLOPs: 31.52 | 7: iteration 103610/ 115203 | consumed samples: 26524160 | consumed tokens: 54321479680 | elapsed time per iteration (s): 0.43 | learning rate: 2.455E-05 | global batch size: 256 | lm loss: 2.223941E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.031 | TFLOPs: 30.96 | 7: iteration 103620/ 115203 | consumed samples: 26526720 | consumed tokens: 54326722560 | elapsed time per iteration (s): 0.42 | learning rate: 2.454E-05 | global batch size: 256 | lm loss: 2.214332E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.339 | TFLOPs: 31.66 | 7: iteration 103630/ 115203 | consumed samples: 26529280 | consumed tokens: 54331965440 | elapsed time per iteration (s): 0.43 | learning rate: 2.454E-05 | global batch size: 256 | lm loss: 2.239044E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.969 | TFLOPs: 30.90 | 7: iteration 103640/ 115203 | consumed samples: 26531840 | consumed tokens: 54337208320 | elapsed time per iteration (s): 0.44 | learning rate: 2.453E-05 | global batch size: 256 | lm loss: 2.207097E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.471 | TFLOPs: 30.56 | 7: iteration 103650/ 115203 | consumed samples: 26534400 | consumed tokens: 54342451200 | elapsed time per iteration (s): 0.43 | learning rate: 2.452E-05 | global batch size: 256 | lm loss: 2.210088E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.965 | TFLOPs: 30.95 | 7: iteration 103660/ 115203 | consumed samples: 26536960 | consumed tokens: 54347694080 | elapsed time per iteration (s): 0.43 | learning rate: 2.451E-05 | global batch size: 256 | lm loss: 2.215574E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.307 | TFLOPs: 31.44 | 7: iteration 103670/ 115203 | consumed samples: 26539520 | consumed tokens: 54352936960 | elapsed time per iteration (s): 0.42 | learning rate: 2.450E-05 | global batch size: 256 | lm loss: 2.227123E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.663 | TFLOPs: 31.67 | 7: iteration 103680/ 115203 | consumed samples: 26542080 | consumed tokens: 54358179840 | elapsed time per iteration (s): 0.43 | learning rate: 2.450E-05 | global batch size: 256 | lm loss: 2.238322E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.497 | TFLOPs: 31.35 | 7: iteration 103690/ 115203 | consumed samples: 26544640 | consumed tokens: 54363422720 | elapsed time per iteration (s): 0.43 | learning rate: 2.449E-05 | global batch size: 256 | lm loss: 2.230657E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.258 | TFLOPs: 31.07 | 7: iteration 103700/ 115203 | consumed samples: 26547200 | consumed tokens: 54368665600 | elapsed time per iteration (s): 0.42 | learning rate: 2.448E-05 | global batch size: 256 | lm loss: 2.193830E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.418 | TFLOPs: 31.71 | 7: iteration 103710/ 115203 | consumed samples: 26549760 | consumed tokens: 54373908480 | elapsed time per iteration (s): 0.43 | learning rate: 2.447E-05 | global batch size: 256 | lm loss: 2.213934E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.156 | TFLOPs: 30.91 | 7: iteration 103720/ 115203 | consumed samples: 26552320 | consumed tokens: 54379151360 | elapsed time per iteration (s): 0.43 | learning rate: 2.447E-05 | global batch size: 256 | lm loss: 2.185542E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.588 | TFLOPs: 30.99 | 7: iteration 103730/ 115203 | consumed samples: 26554880 | consumed tokens: 54384394240 | elapsed time per iteration (s): 0.43 | learning rate: 2.446E-05 | global batch size: 256 | lm loss: 2.210665E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.543 | TFLOPs: 31.35 | 7: iteration 103740/ 115203 | consumed samples: 26557440 | consumed tokens: 54389637120 | elapsed time per iteration (s): 0.44 | learning rate: 2.445E-05 | global batch size: 256 | lm loss: 2.199399E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.009 | TFLOPs: 30.80 | 7: iteration 103750/ 115203 | consumed samples: 26560000 | consumed tokens: 54394880000 | elapsed time per iteration (s): 0.43 | learning rate: 2.444E-05 | global batch size: 256 | lm loss: 2.249707E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.257 | TFLOPs: 31.55 | 7: iteration 103760/ 115203 | consumed samples: 26562560 | consumed tokens: 54400122880 | elapsed time per iteration (s): 0.42 | learning rate: 2.443E-05 | global batch size: 256 | lm loss: 2.203587E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.111 | TFLOPs: 31.75 | 7: iteration 103770/ 115203 | consumed samples: 26565120 | consumed tokens: 54405365760 | elapsed time per iteration (s): 0.44 | learning rate: 2.443E-05 | global batch size: 256 | lm loss: 2.194230E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.648 | TFLOPs: 30.83 | 7: iteration 103780/ 115203 | consumed samples: 26567680 | consumed tokens: 54410608640 | elapsed time per iteration (s): 0.43 | learning rate: 2.442E-05 | global batch size: 256 | lm loss: 2.250084E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.697 | TFLOPs: 31.31 | 7: iteration 103790/ 115203 | consumed samples: 26570240 | consumed tokens: 54415851520 | elapsed time per iteration (s): 0.43 | learning rate: 2.441E-05 | global batch size: 256 | lm loss: 2.191830E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.150 | TFLOPs: 31.49 | 7: iteration 103800/ 115203 | consumed samples: 26572800 | consumed tokens: 54421094400 | elapsed time per iteration (s): 0.43 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 2.243527E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.891 | TFLOPs: 31.21 | 7: iteration 103810/ 115203 | consumed samples: 26575360 | consumed tokens: 54426337280 | elapsed time per iteration (s): 0.43 | learning rate: 2.440E-05 | global batch size: 256 | lm loss: 2.218933E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.321 | TFLOPs: 30.92 | 7: iteration 103820/ 115203 | consumed samples: 26577920 | consumed tokens: 54431580160 | elapsed time per iteration (s): 0.43 | learning rate: 2.439E-05 | global batch size: 256 | lm loss: 2.226378E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.053 | TFLOPs: 31.38 | 7: iteration 103830/ 115203 | consumed samples: 26580480 | consumed tokens: 54436823040 | elapsed time per iteration (s): 0.43 | learning rate: 2.438E-05 | global batch size: 256 | lm loss: 2.241433E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.653 | TFLOPs: 30.94 | 7: iteration 103840/ 115203 | consumed samples: 26583040 | consumed tokens: 54442065920 | elapsed time per iteration (s): 0.46 | learning rate: 2.437E-05 | global batch size: 256 | lm loss: 2.212284E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.766 | TFLOPs: 29.47 | 7: iteration 103850/ 115203 | consumed samples: 26585600 | consumed tokens: 54447308800 | elapsed time per iteration (s): 0.43 | learning rate: 2.437E-05 | global batch size: 256 | lm loss: 2.228762E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.681 | TFLOPs: 31.46 | 7: iteration 103860/ 115203 | consumed samples: 26588160 | consumed tokens: 54452551680 | elapsed time per iteration (s): 0.44 | learning rate: 2.436E-05 | global batch size: 256 | lm loss: 2.200309E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.993 | TFLOPs: 30.38 | 7: iteration 103870/ 115203 | consumed samples: 26590720 | consumed tokens: 54457794560 | elapsed time per iteration (s): 0.43 | learning rate: 2.435E-05 | global batch size: 256 | lm loss: 2.214206E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.065 | TFLOPs: 31.59 | 7: iteration 103880/ 115203 | consumed samples: 26593280 | consumed tokens: 54463037440 | elapsed time per iteration (s): 0.44 | learning rate: 2.434E-05 | global batch size: 256 | lm loss: 2.209257E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.985 | TFLOPs: 30.75 | 7: iteration 103890/ 115203 | consumed samples: 26595840 | consumed tokens: 54468280320 | elapsed time per iteration (s): 0.43 | learning rate: 2.434E-05 | global batch size: 256 | lm loss: 2.260180E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.290 | TFLOPs: 31.29 | 7: iteration 103900/ 115203 | consumed samples: 26598400 | consumed tokens: 54473523200 | elapsed time per iteration (s): 0.43 | learning rate: 2.433E-05 | global batch size: 256 | lm loss: 2.228136E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.434 | TFLOPs: 31.08 | 7: iteration 103910/ 115203 | consumed samples: 26600960 | consumed tokens: 54478766080 | elapsed time per iteration (s): 0.43 | learning rate: 2.432E-05 | global batch size: 256 | lm loss: 2.206511E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.031 | TFLOPs: 31.33 | 7: iteration 103920/ 115203 | consumed samples: 26603520 | consumed tokens: 54484008960 | elapsed time per iteration (s): 0.43 | learning rate: 2.431E-05 | global batch size: 256 | lm loss: 2.195148E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.053 | TFLOPs: 31.01 | 7: iteration 103930/ 115203 | consumed samples: 26606080 | consumed tokens: 54489251840 | elapsed time per iteration (s): 0.43 | learning rate: 2.430E-05 | global batch size: 256 | lm loss: 2.223108E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.571 | TFLOPs: 31.04 | 7: iteration 103940/ 115203 | consumed samples: 26608640 | consumed tokens: 54494494720 | elapsed time per iteration (s): 0.43 | learning rate: 2.430E-05 | global batch size: 256 | lm loss: 2.223602E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.834 | TFLOPs: 31.26 | 7: iteration 103950/ 115203 | consumed samples: 26611200 | consumed tokens: 54499737600 | elapsed time per iteration (s): 0.65 | learning rate: 2.429E-05 | global batch size: 256 | lm loss: 2.252809E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 394.497 | TFLOPs: 20.70 | 7: iteration 103960/ 115203 | consumed samples: 26613760 | consumed tokens: 54504980480 | elapsed time per iteration (s): 0.43 | learning rate: 2.428E-05 | global batch size: 256 | lm loss: 2.209477E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.943 | TFLOPs: 31.22 | 7: iteration 103970/ 115203 | consumed samples: 26616320 | consumed tokens: 54510223360 | elapsed time per iteration (s): 0.43 | learning rate: 2.427E-05 | global batch size: 256 | lm loss: 2.215383E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.826 | TFLOPs: 30.95 | 7: iteration 103980/ 115203 | consumed samples: 26618880 | consumed tokens: 54515466240 | elapsed time per iteration (s): 0.43 | learning rate: 2.427E-05 | global batch size: 256 | lm loss: 2.191821E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.304 | TFLOPs: 30.92 | 7: iteration 103990/ 115203 | consumed samples: 26621440 | consumed tokens: 54520709120 | elapsed time per iteration (s): 0.43 | learning rate: 2.426E-05 | global batch size: 256 | lm loss: 2.206254E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.150 | TFLOPs: 31.28 | 0: [2022-11-29 01:30:13,948] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=0, lr=[2.4252001760011466e-05, 2.4252001760011466e-05, 2.4252001760011466e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 104000/ 115203 | consumed samples: 26624000 | consumed tokens: 54525952000 | elapsed time per iteration (s): 0.44 | learning rate: 2.425E-05 | global batch size: 256 | lm loss: 2.215515E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.376 | TFLOPs: 30.29 | 0: steps: 104000 loss: 2.2266 iter time (s): 0.435 samples/sec: 589.101 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 104000 | lm loss value: 2.122955E+00 | lm loss PPL: 8.355789E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 104000 to checkpoints_221m 0: [2022-11-29 01:30:14,193] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step104000 is begin to save! 0: [2022-11-29 01:30:14,218] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:30:14,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:30:14,356] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:30:14,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:30:14,378] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:30:14,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:30:14,406] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:30:14,433] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:30:14,433] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:30:14,457] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:30:14,457] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:30:14,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:30:14,479] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:30:14,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:30:14,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:30:14,528] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:30:14,528] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:30:14,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:30:14,552] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:30:14,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:30:14,576] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:30:14,601] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:30:14,601] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:30:14,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:30:14,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:30:14,648] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:30:14,648] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:30:14,673] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:30:14,673] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:30:14,696] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:30:14,696] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:30:14,719] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:30:14,720] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:30:14,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:30:14,746] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:30:14,770] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:30:14,770] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:30:14,796] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:30:14,797] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:30:14,800] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:30:14,801] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step104000/mp_rank_00_model_states.pt 0: [2022-11-29 01:30:14,801] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:30:14,803] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:30:14,825] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:30:14,879] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,879] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,879] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,880] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,875] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,880] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,875] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,880] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,875] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,881] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,881] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:30:14,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 1: [2022-11-29 01:30:14,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 5: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:30:14,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 3: [2022-11-29 01:30:14,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:30:14,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 01:30:14,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 2: [2022-11-29 01:30:14,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:30:14,895] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:30:14,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,901] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,901] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 7: [2022-11-29 01:30:14,902] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:30:14,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:30:14,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,907] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 4: [2022-11-29 01:30:14,908] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:30:14,908] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:30:14,908] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,934] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:30:14,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,934] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,934] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: [2022-11-29 01:30:14,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:30:14,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,972] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 6: [2022-11-29 01:30:14,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:30:14,976] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step104000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:30:14,976] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step104000 is ready now! 0: successfully saved checkpoint at iteration 104000 to checkpoints_221m 7: time (ms) | save-checkpoint: 827.70 7: iteration 104010/ 115203 | consumed samples: 26626560 | consumed tokens: 54531194880 | elapsed time per iteration (s): 0.53 | learning rate: 2.424E-05 | global batch size: 256 | lm loss: 2.230993E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 482.728 | TFLOPs: 25.33 | 7: iteration 104020/ 115203 | consumed samples: 26629120 | consumed tokens: 54536437760 | elapsed time per iteration (s): 0.43 | learning rate: 2.424E-05 | global batch size: 256 | lm loss: 2.240990E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.472 | TFLOPs: 31.14 | 7: iteration 104030/ 115203 | consumed samples: 26631680 | consumed tokens: 54541680640 | elapsed time per iteration (s): 0.43 | learning rate: 2.423E-05 | global batch size: 256 | lm loss: 2.194337E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.220 | TFLOPs: 31.44 | 7: iteration 104040/ 115203 | consumed samples: 26634240 | consumed tokens: 54546923520 | elapsed time per iteration (s): 0.43 | learning rate: 2.422E-05 | global batch size: 256 | lm loss: 2.204392E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.689 | TFLOPs: 30.89 | 7: iteration 104050/ 115203 | consumed samples: 26636800 | consumed tokens: 54552166400 | elapsed time per iteration (s): 0.43 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 2.217164E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.339 | TFLOPs: 31.13 | 7: iteration 104060/ 115203 | consumed samples: 26639360 | consumed tokens: 54557409280 | elapsed time per iteration (s): 0.42 | learning rate: 2.421E-05 | global batch size: 256 | lm loss: 2.203764E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.010 | TFLOPs: 31.85 | 7: iteration 104070/ 115203 | consumed samples: 26641920 | consumed tokens: 54562652160 | elapsed time per iteration (s): 0.44 | learning rate: 2.420E-05 | global batch size: 256 | lm loss: 2.231025E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.521 | TFLOPs: 30.41 | 7: iteration 104080/ 115203 | consumed samples: 26644480 | consumed tokens: 54567895040 | elapsed time per iteration (s): 0.42 | learning rate: 2.419E-05 | global batch size: 256 | lm loss: 2.188835E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.200 | TFLOPs: 31.96 | 7: iteration 104090/ 115203 | consumed samples: 26647040 | consumed tokens: 54573137920 | elapsed time per iteration (s): 0.42 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 2.228033E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.950 | TFLOPs: 31.64 | 7: iteration 104100/ 115203 | consumed samples: 26649600 | consumed tokens: 54578380800 | elapsed time per iteration (s): 0.43 | learning rate: 2.418E-05 | global batch size: 256 | lm loss: 2.240327E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.177 | TFLOPs: 30.91 | 7: iteration 104110/ 115203 | consumed samples: 26652160 | consumed tokens: 54583623680 | elapsed time per iteration (s): 0.43 | learning rate: 2.417E-05 | global batch size: 256 | lm loss: 2.201732E+00 | grad norm: 0.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.033 | TFLOPs: 31.43 | 7: iteration 104120/ 115203 | consumed samples: 26654720 | consumed tokens: 54588866560 | elapsed time per iteration (s): 0.44 | learning rate: 2.416E-05 | global batch size: 256 | lm loss: 2.228845E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.097 | TFLOPs: 30.86 | 7: iteration 104130/ 115203 | consumed samples: 26657280 | consumed tokens: 54594109440 | elapsed time per iteration (s): 0.44 | learning rate: 2.415E-05 | global batch size: 256 | lm loss: 2.221068E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.582 | TFLOPs: 30.78 | 7: iteration 104140/ 115203 | consumed samples: 26659840 | consumed tokens: 54599352320 | elapsed time per iteration (s): 0.43 | learning rate: 2.415E-05 | global batch size: 256 | lm loss: 2.213977E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.656 | TFLOPs: 31.20 | 7: iteration 104150/ 115203 | consumed samples: 26662400 | consumed tokens: 54604595200 | elapsed time per iteration (s): 0.43 | learning rate: 2.414E-05 | global batch size: 256 | lm loss: 2.226053E+00 | grad norm: 0.290 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.095 | TFLOPs: 31.22 | 7: iteration 104160/ 115203 | consumed samples: 26664960 | consumed tokens: 54609838080 | elapsed time per iteration (s): 0.45 | learning rate: 2.413E-05 | global batch size: 256 | lm loss: 2.232429E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.242 | TFLOPs: 29.97 | 7: iteration 104170/ 115203 | consumed samples: 26667520 | consumed tokens: 54615080960 | elapsed time per iteration (s): 0.44 | learning rate: 2.412E-05 | global batch size: 256 | lm loss: 2.199059E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.534 | TFLOPs: 30.51 | 7: iteration 104180/ 115203 | consumed samples: 26670080 | consumed tokens: 54620323840 | elapsed time per iteration (s): 0.43 | learning rate: 2.412E-05 | global batch size: 256 | lm loss: 2.219554E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.153 | TFLOPs: 31.49 | 7: iteration 104190/ 115203 | consumed samples: 26672640 | consumed tokens: 54625566720 | elapsed time per iteration (s): 0.43 | learning rate: 2.411E-05 | global batch size: 256 | lm loss: 2.212762E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.574 | TFLOPs: 31.56 | 7: iteration 104200/ 115203 | consumed samples: 26675200 | consumed tokens: 54630809600 | elapsed time per iteration (s): 0.43 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 2.243675E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.364 | TFLOPs: 31.34 | 7: iteration 104210/ 115203 | consumed samples: 26677760 | consumed tokens: 54636052480 | elapsed time per iteration (s): 0.42 | learning rate: 2.410E-05 | global batch size: 256 | lm loss: 2.222503E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.004 | TFLOPs: 31.74 | 7: iteration 104220/ 115203 | consumed samples: 26680320 | consumed tokens: 54641295360 | elapsed time per iteration (s): 0.43 | learning rate: 2.409E-05 | global batch size: 256 | lm loss: 2.225173E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.653 | TFLOPs: 31.57 | 7: iteration 104230/ 115203 | consumed samples: 26682880 | consumed tokens: 54646538240 | elapsed time per iteration (s): 0.43 | learning rate: 2.408E-05 | global batch size: 256 | lm loss: 2.188813E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.106 | TFLOPs: 31.22 | 7: iteration 104240/ 115203 | consumed samples: 26685440 | consumed tokens: 54651781120 | elapsed time per iteration (s): 0.43 | learning rate: 2.407E-05 | global batch size: 256 | lm loss: 2.184502E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.504 | TFLOPs: 31.30 | 7: iteration 104250/ 115203 | consumed samples: 26688000 | consumed tokens: 54657024000 | elapsed time per iteration (s): 0.44 | learning rate: 2.407E-05 | global batch size: 256 | lm loss: 2.226591E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.322 | TFLOPs: 30.50 | 7: iteration 104260/ 115203 | consumed samples: 26690560 | consumed tokens: 54662266880 | elapsed time per iteration (s): 0.44 | learning rate: 2.406E-05 | global batch size: 256 | lm loss: 2.217105E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.515 | TFLOPs: 30.67 | 7: iteration 104270/ 115203 | consumed samples: 26693120 | consumed tokens: 54667509760 | elapsed time per iteration (s): 0.43 | learning rate: 2.405E-05 | global batch size: 256 | lm loss: 2.201863E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.910 | TFLOPs: 31.06 | 7: iteration 104280/ 115203 | consumed samples: 26695680 | consumed tokens: 54672752640 | elapsed time per iteration (s): 0.42 | learning rate: 2.404E-05 | global batch size: 256 | lm loss: 2.214022E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.592 | TFLOPs: 31.77 | 7: iteration 104290/ 115203 | consumed samples: 26698240 | consumed tokens: 54677995520 | elapsed time per iteration (s): 0.42 | learning rate: 2.404E-05 | global batch size: 256 | lm loss: 2.190211E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.659 | TFLOPs: 31.67 | 7: iteration 104300/ 115203 | consumed samples: 26700800 | consumed tokens: 54683238400 | elapsed time per iteration (s): 0.43 | learning rate: 2.403E-05 | global batch size: 256 | lm loss: 2.223787E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.610 | TFLOPs: 30.94 | 7: iteration 104310/ 115203 | consumed samples: 26703360 | consumed tokens: 54688481280 | elapsed time per iteration (s): 0.45 | learning rate: 2.402E-05 | global batch size: 256 | lm loss: 2.237678E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.510 | TFLOPs: 30.14 | 7: iteration 104320/ 115203 | consumed samples: 26705920 | consumed tokens: 54693724160 | elapsed time per iteration (s): 0.43 | learning rate: 2.401E-05 | global batch size: 256 | lm loss: 2.230951E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.288 | TFLOPs: 31.18 | 7: iteration 104330/ 115203 | consumed samples: 26708480 | consumed tokens: 54698967040 | elapsed time per iteration (s): 0.43 | learning rate: 2.401E-05 | global batch size: 256 | lm loss: 2.227939E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.810 | TFLOPs: 31.10 | 7: iteration 104340/ 115203 | consumed samples: 26711040 | consumed tokens: 54704209920 | elapsed time per iteration (s): 0.43 | learning rate: 2.400E-05 | global batch size: 256 | lm loss: 2.208676E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.810 | TFLOPs: 31.16 | 7: iteration 104350/ 115203 | consumed samples: 26713600 | consumed tokens: 54709452800 | elapsed time per iteration (s): 0.43 | learning rate: 2.399E-05 | global batch size: 256 | lm loss: 2.214995E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.065 | TFLOPs: 30.91 | 7: iteration 104360/ 115203 | consumed samples: 26716160 | consumed tokens: 54714695680 | elapsed time per iteration (s): 0.43 | learning rate: 2.399E-05 | global batch size: 256 | lm loss: 2.216974E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.602 | TFLOPs: 31.36 | 7: iteration 104370/ 115203 | consumed samples: 26718720 | consumed tokens: 54719938560 | elapsed time per iteration (s): 0.43 | learning rate: 2.398E-05 | global batch size: 256 | lm loss: 2.258030E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.419 | TFLOPs: 31.40 | 7: iteration 104380/ 115203 | consumed samples: 26721280 | consumed tokens: 54725181440 | elapsed time per iteration (s): 0.43 | learning rate: 2.397E-05 | global batch size: 256 | lm loss: 2.225059E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.947 | TFLOPs: 31.32 | 7: iteration 104390/ 115203 | consumed samples: 26723840 | consumed tokens: 54730424320 | elapsed time per iteration (s): 0.42 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 2.225282E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.713 | TFLOPs: 31.73 | 7: iteration 104400/ 115203 | consumed samples: 26726400 | consumed tokens: 54735667200 | elapsed time per iteration (s): 0.45 | learning rate: 2.396E-05 | global batch size: 256 | lm loss: 2.198441E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.234 | TFLOPs: 30.02 | 7: iteration 104410/ 115203 | consumed samples: 26728960 | consumed tokens: 54740910080 | elapsed time per iteration (s): 0.43 | learning rate: 2.395E-05 | global batch size: 256 | lm loss: 2.234128E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.435 | TFLOPs: 31.24 | 7: iteration 104420/ 115203 | consumed samples: 26731520 | consumed tokens: 54746152960 | elapsed time per iteration (s): 0.43 | learning rate: 2.394E-05 | global batch size: 256 | lm loss: 2.200647E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.882 | TFLOPs: 31.58 | 7: iteration 104430/ 115203 | consumed samples: 26734080 | consumed tokens: 54751395840 | elapsed time per iteration (s): 0.43 | learning rate: 2.393E-05 | global batch size: 256 | lm loss: 2.206897E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.389 | TFLOPs: 31.55 | 7: iteration 104440/ 115203 | consumed samples: 26736640 | consumed tokens: 54756638720 | elapsed time per iteration (s): 0.42 | learning rate: 2.393E-05 | global batch size: 256 | lm loss: 2.250920E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.056 | TFLOPs: 31.80 | 7: iteration 104450/ 115203 | consumed samples: 26739200 | consumed tokens: 54761881600 | elapsed time per iteration (s): 0.43 | learning rate: 2.392E-05 | global batch size: 256 | lm loss: 2.226317E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.442 | TFLOPs: 31.35 | 7: iteration 104460/ 115203 | consumed samples: 26741760 | consumed tokens: 54767124480 | elapsed time per iteration (s): 0.43 | learning rate: 2.391E-05 | global batch size: 256 | lm loss: 2.220307E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.039 | TFLOPs: 31.17 | 7: iteration 104470/ 115203 | consumed samples: 26744320 | consumed tokens: 54772367360 | elapsed time per iteration (s): 0.44 | learning rate: 2.391E-05 | global batch size: 256 | lm loss: 2.216745E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.338 | TFLOPs: 30.61 | 7: iteration 104480/ 115203 | consumed samples: 26746880 | consumed tokens: 54777610240 | elapsed time per iteration (s): 0.45 | learning rate: 2.390E-05 | global batch size: 256 | lm loss: 2.235297E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.288 | TFLOPs: 29.55 | 7: iteration 104490/ 115203 | consumed samples: 26749440 | consumed tokens: 54782853120 | elapsed time per iteration (s): 0.43 | learning rate: 2.389E-05 | global batch size: 256 | lm loss: 2.216298E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.076 | TFLOPs: 31.59 | 7: iteration 104500/ 115203 | consumed samples: 26752000 | consumed tokens: 54788096000 | elapsed time per iteration (s): 0.43 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 2.193969E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.894 | TFLOPs: 31.37 | 7: iteration 104510/ 115203 | consumed samples: 26754560 | consumed tokens: 54793338880 | elapsed time per iteration (s): 0.43 | learning rate: 2.388E-05 | global batch size: 256 | lm loss: 2.225064E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.256 | TFLOPs: 31.34 | 7: iteration 104520/ 115203 | consumed samples: 26757120 | consumed tokens: 54798581760 | elapsed time per iteration (s): 0.42 | learning rate: 2.387E-05 | global batch size: 256 | lm loss: 2.225104E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.697 | TFLOPs: 31.78 | 7: iteration 104530/ 115203 | consumed samples: 26759680 | consumed tokens: 54803824640 | elapsed time per iteration (s): 0.43 | learning rate: 2.386E-05 | global batch size: 256 | lm loss: 2.219388E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.429 | TFLOPs: 31.19 | 7: iteration 104540/ 115203 | consumed samples: 26762240 | consumed tokens: 54809067520 | elapsed time per iteration (s): 0.42 | learning rate: 2.385E-05 | global batch size: 256 | lm loss: 2.242946E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.666 | TFLOPs: 31.73 | 7: iteration 104550/ 115203 | consumed samples: 26764800 | consumed tokens: 54814310400 | elapsed time per iteration (s): 0.43 | learning rate: 2.385E-05 | global batch size: 256 | lm loss: 2.232748E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.809 | TFLOPs: 31.52 | 7: iteration 104560/ 115203 | consumed samples: 26767360 | consumed tokens: 54819553280 | elapsed time per iteration (s): 0.42 | learning rate: 2.384E-05 | global batch size: 256 | lm loss: 2.238044E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.477 | TFLOPs: 31.66 | 7: iteration 104570/ 115203 | consumed samples: 26769920 | consumed tokens: 54824796160 | elapsed time per iteration (s): 0.44 | learning rate: 2.383E-05 | global batch size: 256 | lm loss: 2.190557E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.581 | TFLOPs: 30.72 | 7: iteration 104580/ 115203 | consumed samples: 26772480 | consumed tokens: 54830039040 | elapsed time per iteration (s): 0.43 | learning rate: 2.383E-05 | global batch size: 256 | lm loss: 2.232951E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.952 | TFLOPs: 31.11 | 7: iteration 104590/ 115203 | consumed samples: 26775040 | consumed tokens: 54835281920 | elapsed time per iteration (s): 0.42 | learning rate: 2.382E-05 | global batch size: 256 | lm loss: 2.217271E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.731 | TFLOPs: 31.68 | 7: iteration 104600/ 115203 | consumed samples: 26777600 | consumed tokens: 54840524800 | elapsed time per iteration (s): 0.43 | learning rate: 2.381E-05 | global batch size: 256 | lm loss: 2.202066E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.074 | TFLOPs: 31.01 | 7: iteration 104610/ 115203 | consumed samples: 26780160 | consumed tokens: 54845767680 | elapsed time per iteration (s): 0.42 | learning rate: 2.380E-05 | global batch size: 256 | lm loss: 2.185371E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.189 | TFLOPs: 31.86 | 7: iteration 104620/ 115203 | consumed samples: 26782720 | consumed tokens: 54851010560 | elapsed time per iteration (s): 0.43 | learning rate: 2.380E-05 | global batch size: 256 | lm loss: 2.246863E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.537 | TFLOPs: 31.30 | 7: iteration 104630/ 115203 | consumed samples: 26785280 | consumed tokens: 54856253440 | elapsed time per iteration (s): 0.43 | learning rate: 2.379E-05 | global batch size: 256 | lm loss: 2.217303E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.663 | TFLOPs: 31.57 | 7: iteration 104640/ 115203 | consumed samples: 26787840 | consumed tokens: 54861496320 | elapsed time per iteration (s): 0.43 | learning rate: 2.378E-05 | global batch size: 256 | lm loss: 2.213697E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.281 | TFLOPs: 31.50 | 7: iteration 104650/ 115203 | consumed samples: 26790400 | consumed tokens: 54866739200 | elapsed time per iteration (s): 0.43 | learning rate: 2.378E-05 | global batch size: 256 | lm loss: 2.187829E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.130 | TFLOPs: 31.59 | 7: iteration 104660/ 115203 | consumed samples: 26792960 | consumed tokens: 54871982080 | elapsed time per iteration (s): 0.42 | learning rate: 2.377E-05 | global batch size: 256 | lm loss: 2.209826E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.625 | TFLOPs: 31.99 | 7: iteration 104670/ 115203 | consumed samples: 26795520 | consumed tokens: 54877224960 | elapsed time per iteration (s): 0.43 | learning rate: 2.376E-05 | global batch size: 256 | lm loss: 2.233058E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.467 | TFLOPs: 31.19 | 7: iteration 104680/ 115203 | consumed samples: 26798080 | consumed tokens: 54882467840 | elapsed time per iteration (s): 0.43 | learning rate: 2.376E-05 | global batch size: 256 | lm loss: 2.234215E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.277 | TFLOPs: 30.92 | 7: iteration 104690/ 115203 | consumed samples: 26800640 | consumed tokens: 54887710720 | elapsed time per iteration (s): 0.43 | learning rate: 2.375E-05 | global batch size: 256 | lm loss: 2.200451E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.316 | TFLOPs: 31.60 | 7: iteration 104700/ 115203 | consumed samples: 26803200 | consumed tokens: 54892953600 | elapsed time per iteration (s): 0.43 | learning rate: 2.374E-05 | global batch size: 256 | lm loss: 2.221198E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.831 | TFLOPs: 30.95 | 7: iteration 104710/ 115203 | consumed samples: 26805760 | consumed tokens: 54898196480 | elapsed time per iteration (s): 0.43 | learning rate: 2.373E-05 | global batch size: 256 | lm loss: 2.231048E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.321 | TFLOPs: 31.24 | 7: iteration 104720/ 115203 | consumed samples: 26808320 | consumed tokens: 54903439360 | elapsed time per iteration (s): 0.43 | learning rate: 2.373E-05 | global batch size: 256 | lm loss: 2.242042E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.928 | TFLOPs: 31.53 | 7: iteration 104730/ 115203 | consumed samples: 26810880 | consumed tokens: 54908682240 | elapsed time per iteration (s): 0.43 | learning rate: 2.372E-05 | global batch size: 256 | lm loss: 2.254690E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.859 | TFLOPs: 31.42 | 7: iteration 104740/ 115203 | consumed samples: 26813440 | consumed tokens: 54913925120 | elapsed time per iteration (s): 0.43 | learning rate: 2.371E-05 | global batch size: 256 | lm loss: 2.226193E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.674 | TFLOPs: 31.52 | 7: iteration 104750/ 115203 | consumed samples: 26816000 | consumed tokens: 54919168000 | elapsed time per iteration (s): 0.45 | learning rate: 2.371E-05 | global batch size: 256 | lm loss: 2.180375E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.292 | TFLOPs: 30.03 | 7: iteration 104760/ 115203 | consumed samples: 26818560 | consumed tokens: 54924410880 | elapsed time per iteration (s): 0.43 | learning rate: 2.370E-05 | global batch size: 256 | lm loss: 2.213410E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.188 | TFLOPs: 31.44 | 7: iteration 104770/ 115203 | consumed samples: 26821120 | consumed tokens: 54929653760 | elapsed time per iteration (s): 0.43 | learning rate: 2.369E-05 | global batch size: 256 | lm loss: 2.221950E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.167 | TFLOPs: 30.97 | 7: iteration 104780/ 115203 | consumed samples: 26823680 | consumed tokens: 54934896640 | elapsed time per iteration (s): 0.43 | learning rate: 2.368E-05 | global batch size: 256 | lm loss: 2.206996E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.773 | TFLOPs: 31.31 | 7: iteration 104790/ 115203 | consumed samples: 26826240 | consumed tokens: 54940139520 | elapsed time per iteration (s): 0.43 | learning rate: 2.368E-05 | global batch size: 256 | lm loss: 2.223383E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.013 | TFLOPs: 31.01 | 7: iteration 104800/ 115203 | consumed samples: 26828800 | consumed tokens: 54945382400 | elapsed time per iteration (s): 0.42 | learning rate: 2.367E-05 | global batch size: 256 | lm loss: 2.217502E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.291 | TFLOPs: 31.81 | 7: iteration 104810/ 115203 | consumed samples: 26831360 | consumed tokens: 54950625280 | elapsed time per iteration (s): 0.43 | learning rate: 2.366E-05 | global batch size: 256 | lm loss: 2.241627E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.550 | TFLOPs: 31.35 | 7: iteration 104820/ 115203 | consumed samples: 26833920 | consumed tokens: 54955868160 | elapsed time per iteration (s): 0.43 | learning rate: 2.366E-05 | global batch size: 256 | lm loss: 2.253679E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.911 | TFLOPs: 31.11 | 7: iteration 104830/ 115203 | consumed samples: 26836480 | consumed tokens: 54961111040 | elapsed time per iteration (s): 0.44 | learning rate: 2.365E-05 | global batch size: 256 | lm loss: 2.196283E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.124 | TFLOPs: 30.49 | 7: iteration 104840/ 115203 | consumed samples: 26839040 | consumed tokens: 54966353920 | elapsed time per iteration (s): 0.43 | learning rate: 2.364E-05 | global batch size: 256 | lm loss: 2.266872E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.853 | TFLOPs: 31.11 | 7: iteration 104850/ 115203 | consumed samples: 26841600 | consumed tokens: 54971596800 | elapsed time per iteration (s): 0.43 | learning rate: 2.364E-05 | global batch size: 256 | lm loss: 2.230601E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.497 | TFLOPs: 30.93 | 7: iteration 104860/ 115203 | consumed samples: 26844160 | consumed tokens: 54976839680 | elapsed time per iteration (s): 0.42 | learning rate: 2.363E-05 | global batch size: 256 | lm loss: 2.238533E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.322 | TFLOPs: 31.71 | 7: iteration 104870/ 115203 | consumed samples: 26846720 | consumed tokens: 54982082560 | elapsed time per iteration (s): 0.44 | learning rate: 2.362E-05 | global batch size: 256 | lm loss: 2.199406E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.689 | TFLOPs: 30.84 | 7: iteration 104880/ 115203 | consumed samples: 26849280 | consumed tokens: 54987325440 | elapsed time per iteration (s): 0.43 | learning rate: 2.361E-05 | global batch size: 256 | lm loss: 2.246432E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.599 | TFLOPs: 31.30 | 7: iteration 104890/ 115203 | consumed samples: 26851840 | consumed tokens: 54992568320 | elapsed time per iteration (s): 0.43 | learning rate: 2.361E-05 | global batch size: 256 | lm loss: 2.266330E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.898 | TFLOPs: 31.48 | 7: iteration 104900/ 115203 | consumed samples: 26854400 | consumed tokens: 54997811200 | elapsed time per iteration (s): 0.43 | learning rate: 2.360E-05 | global batch size: 256 | lm loss: 2.205506E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.360 | TFLOPs: 31.50 | 7: iteration 104910/ 115203 | consumed samples: 26856960 | consumed tokens: 55003054080 | elapsed time per iteration (s): 0.43 | learning rate: 2.359E-05 | global batch size: 256 | lm loss: 2.210532E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.816 | TFLOPs: 31.42 | 7: iteration 104920/ 115203 | consumed samples: 26859520 | consumed tokens: 55008296960 | elapsed time per iteration (s): 0.43 | learning rate: 2.359E-05 | global batch size: 256 | lm loss: 2.216200E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.466 | TFLOPs: 30.93 | 7: iteration 104930/ 115203 | consumed samples: 26862080 | consumed tokens: 55013539840 | elapsed time per iteration (s): 0.42 | learning rate: 2.358E-05 | global batch size: 256 | lm loss: 2.253410E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.566 | TFLOPs: 31.62 | 7: iteration 104940/ 115203 | consumed samples: 26864640 | consumed tokens: 55018782720 | elapsed time per iteration (s): 0.42 | learning rate: 2.357E-05 | global batch size: 256 | lm loss: 2.192284E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.400 | TFLOPs: 31.61 | 7: iteration 104950/ 115203 | consumed samples: 26867200 | consumed tokens: 55024025600 | elapsed time per iteration (s): 0.43 | learning rate: 2.357E-05 | global batch size: 256 | lm loss: 2.210958E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.283 | TFLOPs: 31.29 | 7: iteration 104960/ 115203 | consumed samples: 26869760 | consumed tokens: 55029268480 | elapsed time per iteration (s): 0.43 | learning rate: 2.356E-05 | global batch size: 256 | lm loss: 2.230959E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.656 | TFLOPs: 31.31 | 7: iteration 104970/ 115203 | consumed samples: 26872320 | consumed tokens: 55034511360 | elapsed time per iteration (s): 0.43 | learning rate: 2.355E-05 | global batch size: 256 | lm loss: 2.201669E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.041 | TFLOPs: 31.48 | 7: iteration 104980/ 115203 | consumed samples: 26874880 | consumed tokens: 55039754240 | elapsed time per iteration (s): 0.42 | learning rate: 2.355E-05 | global batch size: 256 | lm loss: 2.234334E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.772 | TFLOPs: 31.73 | 7: iteration 104990/ 115203 | consumed samples: 26877440 | consumed tokens: 55044997120 | elapsed time per iteration (s): 0.43 | learning rate: 2.354E-05 | global batch size: 256 | lm loss: 2.235989E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.826 | TFLOPs: 31.16 | 7: iteration 105000/ 115203 | consumed samples: 26880000 | consumed tokens: 55050240000 | elapsed time per iteration (s): 0.43 | learning rate: 2.353E-05 | global batch size: 256 | lm loss: 2.196684E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.250 | TFLOPs: 31.55 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 105000 | lm loss value: 2.097162E+00 | lm loss PPL: 8.143023E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 105000 to checkpoints_221m 0: [2022-11-29 01:37:25,197] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step105000 is begin to save! 0: [2022-11-29 01:37:25,201] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:37:25,311] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:37:25,311] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:37:25,332] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:37:25,333] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:37:25,360] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:37:25,360] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:37:25,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:37:25,382] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:37:25,406] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:37:25,407] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:37:25,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:37:25,430] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:37:25,453] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:37:25,453] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:37:25,477] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:37:25,478] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:37:25,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:37:25,502] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:37:25,525] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:37:25,525] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:37:25,548] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:37:25,549] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:37:25,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:37:25,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:37:25,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:37:25,593] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:37:25,616] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:37:25,617] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:37:25,640] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:37:25,641] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:37:25,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:37:25,663] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:37:25,687] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:37:25,687] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:37:25,710] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:37:25,710] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:37:25,734] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:37:25,734] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:37:25,738] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:37:25,739] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step105000/mp_rank_00_model_states.pt 0: [2022-11-29 01:37:25,739] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:37:25,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,760] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:37:25,808] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,818] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,819] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,819] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,811] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,812] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,811] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,812] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,813] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,813] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,814] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,814] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,815] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,815] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,817] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,816] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,816] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,817] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,825] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,825] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,826] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,826] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 6: [2022-11-29 01:37:25,827] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:37:25,827] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:37:25,827] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,828] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,829] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,829] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,829] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,831] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 4: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,832] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 1: [2022-11-29 01:37:25,832] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 7: [2022-11-29 01:37:25,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:37:25,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:37:25,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 5: [2022-11-29 01:37:25,824] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:37:25,824] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 01:37:25,824] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 3: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:37:25,823] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:37:25,823] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,839] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:37:25,839] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,839] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,843] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,843] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,843] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 2: [2022-11-29 01:37:25,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:37:25,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:37:25,844] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: [2022-11-29 01:37:25,870] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step105000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:37:25,870] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step105000 is ready now! 0: successfully saved checkpoint at iteration 105000 to checkpoints_221m 7: time (ms) | save-checkpoint: 686.09 7: iteration 105010/ 115203 | consumed samples: 26882560 | consumed tokens: 55055482880 | elapsed time per iteration (s): 0.51 | learning rate: 2.352E-05 | global batch size: 256 | lm loss: 2.240699E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 499.485 | TFLOPs: 26.21 | 7: iteration 105020/ 115203 | consumed samples: 26885120 | consumed tokens: 55060725760 | elapsed time per iteration (s): 0.43 | learning rate: 2.352E-05 | global batch size: 256 | lm loss: 2.218455E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.724 | TFLOPs: 30.94 | 7: iteration 105030/ 115203 | consumed samples: 26887680 | consumed tokens: 55065968640 | elapsed time per iteration (s): 0.43 | learning rate: 2.351E-05 | global batch size: 256 | lm loss: 2.220154E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.627 | TFLOPs: 31.46 | 7: iteration 105040/ 115203 | consumed samples: 26890240 | consumed tokens: 55071211520 | elapsed time per iteration (s): 0.43 | learning rate: 2.350E-05 | global batch size: 256 | lm loss: 2.205336E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.049 | TFLOPs: 31.54 | 7: iteration 105050/ 115203 | consumed samples: 26892800 | consumed tokens: 55076454400 | elapsed time per iteration (s): 0.43 | learning rate: 2.350E-05 | global batch size: 256 | lm loss: 2.212715E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.369 | TFLOPs: 31.55 | 7: iteration 105060/ 115203 | consumed samples: 26895360 | consumed tokens: 55081697280 | elapsed time per iteration (s): 0.44 | learning rate: 2.349E-05 | global batch size: 256 | lm loss: 2.238951E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.246 | TFLOPs: 30.39 | 7: iteration 105070/ 115203 | consumed samples: 26897920 | consumed tokens: 55086940160 | elapsed time per iteration (s): 0.43 | learning rate: 2.348E-05 | global batch size: 256 | lm loss: 2.226155E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.848 | TFLOPs: 31.05 | 7: iteration 105080/ 115203 | consumed samples: 26900480 | consumed tokens: 55092183040 | elapsed time per iteration (s): 0.44 | learning rate: 2.348E-05 | global batch size: 256 | lm loss: 2.222475E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.736 | TFLOPs: 30.63 | 7: iteration 105090/ 115203 | consumed samples: 26903040 | consumed tokens: 55097425920 | elapsed time per iteration (s): 0.43 | learning rate: 2.347E-05 | global batch size: 256 | lm loss: 2.224210E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.912 | TFLOPs: 31.53 | 7: iteration 105100/ 115203 | consumed samples: 26905600 | consumed tokens: 55102668800 | elapsed time per iteration (s): 0.45 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 2.193174E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.122 | TFLOPs: 30.18 | 7: iteration 105110/ 115203 | consumed samples: 26908160 | consumed tokens: 55107911680 | elapsed time per iteration (s): 0.43 | learning rate: 2.346E-05 | global batch size: 256 | lm loss: 2.204348E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.410 | TFLOPs: 30.98 | 7: iteration 105120/ 115203 | consumed samples: 26910720 | consumed tokens: 55113154560 | elapsed time per iteration (s): 0.43 | learning rate: 2.345E-05 | global batch size: 256 | lm loss: 2.224425E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.071 | TFLOPs: 31.06 | 7: iteration 105130/ 115203 | consumed samples: 26913280 | consumed tokens: 55118397440 | elapsed time per iteration (s): 0.43 | learning rate: 2.344E-05 | global batch size: 256 | lm loss: 2.238045E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.104 | TFLOPs: 31.12 | 7: iteration 105140/ 115203 | consumed samples: 26915840 | consumed tokens: 55123640320 | elapsed time per iteration (s): 0.43 | learning rate: 2.344E-05 | global batch size: 256 | lm loss: 2.215239E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.853 | TFLOPs: 31.00 | 7: iteration 105150/ 115203 | consumed samples: 26918400 | consumed tokens: 55128883200 | elapsed time per iteration (s): 0.42 | learning rate: 2.343E-05 | global batch size: 256 | lm loss: 2.174484E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.638 | TFLOPs: 31.72 | 7: iteration 105160/ 115203 | consumed samples: 26920960 | consumed tokens: 55134126080 | elapsed time per iteration (s): 0.43 | learning rate: 2.342E-05 | global batch size: 256 | lm loss: 2.224472E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.970 | TFLOPs: 31.37 | 7: iteration 105170/ 115203 | consumed samples: 26923520 | consumed tokens: 55139368960 | elapsed time per iteration (s): 0.42 | learning rate: 2.342E-05 | global batch size: 256 | lm loss: 2.229898E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.845 | TFLOPs: 31.89 | 7: iteration 105180/ 115203 | consumed samples: 26926080 | consumed tokens: 55144611840 | elapsed time per iteration (s): 0.43 | learning rate: 2.341E-05 | global batch size: 256 | lm loss: 2.238412E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.081 | TFLOPs: 31.59 | 7: iteration 105190/ 115203 | consumed samples: 26928640 | consumed tokens: 55149854720 | elapsed time per iteration (s): 0.44 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 2.222860E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.851 | TFLOPs: 30.79 | 7: iteration 105200/ 115203 | consumed samples: 26931200 | consumed tokens: 55155097600 | elapsed time per iteration (s): 0.43 | learning rate: 2.340E-05 | global batch size: 256 | lm loss: 2.220793E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.648 | TFLOPs: 30.99 | 7: iteration 105210/ 115203 | consumed samples: 26933760 | consumed tokens: 55160340480 | elapsed time per iteration (s): 0.43 | learning rate: 2.339E-05 | global batch size: 256 | lm loss: 2.214498E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.665 | TFLOPs: 31.31 | 7: iteration 105220/ 115203 | consumed samples: 26936320 | consumed tokens: 55165583360 | elapsed time per iteration (s): 0.43 | learning rate: 2.338E-05 | global batch size: 256 | lm loss: 2.247297E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.235 | TFLOPs: 31.28 | 7: iteration 105230/ 115203 | consumed samples: 26938880 | consumed tokens: 55170826240 | elapsed time per iteration (s): 0.43 | learning rate: 2.338E-05 | global batch size: 256 | lm loss: 2.240724E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.224 | TFLOPs: 31.28 | 7: iteration 105240/ 115203 | consumed samples: 26941440 | consumed tokens: 55176069120 | elapsed time per iteration (s): 0.42 | learning rate: 2.337E-05 | global batch size: 256 | lm loss: 2.230240E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.365 | TFLOPs: 31.87 | 7: iteration 105250/ 115203 | consumed samples: 26944000 | consumed tokens: 55181312000 | elapsed time per iteration (s): 0.42 | learning rate: 2.336E-05 | global batch size: 256 | lm loss: 2.234083E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.616 | TFLOPs: 31.83 | 7: iteration 105260/ 115203 | consumed samples: 26946560 | consumed tokens: 55186554880 | elapsed time per iteration (s): 0.43 | learning rate: 2.336E-05 | global batch size: 256 | lm loss: 2.207920E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.533 | TFLOPs: 30.98 | 7: iteration 105270/ 115203 | consumed samples: 26949120 | consumed tokens: 55191797760 | elapsed time per iteration (s): 0.44 | learning rate: 2.335E-05 | global batch size: 256 | lm loss: 2.214528E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.571 | TFLOPs: 30.83 | 7: iteration 105280/ 115203 | consumed samples: 26951680 | consumed tokens: 55197040640 | elapsed time per iteration (s): 0.43 | learning rate: 2.334E-05 | global batch size: 256 | lm loss: 2.192072E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.629 | TFLOPs: 31.04 | 7: iteration 105290/ 115203 | consumed samples: 26954240 | consumed tokens: 55202283520 | elapsed time per iteration (s): 0.43 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 2.225975E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.695 | TFLOPs: 31.05 | 7: iteration 105300/ 115203 | consumed samples: 26956800 | consumed tokens: 55207526400 | elapsed time per iteration (s): 0.42 | learning rate: 2.333E-05 | global batch size: 256 | lm loss: 2.235802E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.351 | TFLOPs: 31.66 | 7: iteration 105310/ 115203 | consumed samples: 26959360 | consumed tokens: 55212769280 | elapsed time per iteration (s): 0.43 | learning rate: 2.332E-05 | global batch size: 256 | lm loss: 2.209062E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.619 | TFLOPs: 31.46 | 7: iteration 105320/ 115203 | consumed samples: 26961920 | consumed tokens: 55218012160 | elapsed time per iteration (s): 0.43 | learning rate: 2.331E-05 | global batch size: 256 | lm loss: 2.185348E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.198 | TFLOPs: 31.28 | 7: iteration 105330/ 115203 | consumed samples: 26964480 | consumed tokens: 55223255040 | elapsed time per iteration (s): 0.44 | learning rate: 2.331E-05 | global batch size: 256 | lm loss: 2.186162E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.068 | TFLOPs: 30.70 | 7: iteration 105340/ 115203 | consumed samples: 26967040 | consumed tokens: 55228497920 | elapsed time per iteration (s): 0.44 | learning rate: 2.330E-05 | global batch size: 256 | lm loss: 2.233008E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.807 | TFLOPs: 30.32 | 7: iteration 105350/ 115203 | consumed samples: 26969600 | consumed tokens: 55233740800 | elapsed time per iteration (s): 0.42 | learning rate: 2.329E-05 | global batch size: 256 | lm loss: 2.217020E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.436 | TFLOPs: 31.87 | 7: iteration 105360/ 115203 | consumed samples: 26972160 | consumed tokens: 55238983680 | elapsed time per iteration (s): 0.42 | learning rate: 2.329E-05 | global batch size: 256 | lm loss: 2.221080E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.779 | TFLOPs: 31.89 | 7: iteration 105370/ 115203 | consumed samples: 26974720 | consumed tokens: 55244226560 | elapsed time per iteration (s): 0.43 | learning rate: 2.328E-05 | global batch size: 256 | lm loss: 2.213537E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.125 | TFLOPs: 31.49 | 7: iteration 105380/ 115203 | consumed samples: 26977280 | consumed tokens: 55249469440 | elapsed time per iteration (s): 0.43 | learning rate: 2.328E-05 | global batch size: 256 | lm loss: 2.222955E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.890 | TFLOPs: 31.42 | 7: iteration 105390/ 115203 | consumed samples: 26979840 | consumed tokens: 55254712320 | elapsed time per iteration (s): 0.43 | learning rate: 2.327E-05 | global batch size: 256 | lm loss: 2.229091E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.580 | TFLOPs: 31.04 | 7: iteration 105400/ 115203 | consumed samples: 26982400 | consumed tokens: 55259955200 | elapsed time per iteration (s): 0.43 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 2.229725E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.542 | TFLOPs: 31.56 | 7: iteration 105410/ 115203 | consumed samples: 26984960 | consumed tokens: 55265198080 | elapsed time per iteration (s): 0.42 | learning rate: 2.326E-05 | global batch size: 256 | lm loss: 2.226829E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.848 | TFLOPs: 31.68 | 7: iteration 105420/ 115203 | consumed samples: 26987520 | consumed tokens: 55270440960 | elapsed time per iteration (s): 0.43 | learning rate: 2.325E-05 | global batch size: 256 | lm loss: 2.232653E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.072 | TFLOPs: 31.22 | 7: iteration 105430/ 115203 | consumed samples: 26990080 | consumed tokens: 55275683840 | elapsed time per iteration (s): 0.44 | learning rate: 2.324E-05 | global batch size: 256 | lm loss: 2.237411E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.826 | TFLOPs: 30.21 | 7: iteration 105440/ 115203 | consumed samples: 26992640 | consumed tokens: 55280926720 | elapsed time per iteration (s): 0.43 | learning rate: 2.324E-05 | global batch size: 256 | lm loss: 2.215387E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.726 | TFLOPs: 31.31 | 7: iteration 105450/ 115203 | consumed samples: 26995200 | consumed tokens: 55286169600 | elapsed time per iteration (s): 0.45 | learning rate: 2.323E-05 | global batch size: 256 | lm loss: 2.209561E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.006 | TFLOPs: 30.06 | 7: iteration 105460/ 115203 | consumed samples: 26997760 | consumed tokens: 55291412480 | elapsed time per iteration (s): 0.43 | learning rate: 2.322E-05 | global batch size: 256 | lm loss: 2.252328E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.277 | TFLOPs: 31.34 | 7: iteration 105470/ 115203 | consumed samples: 27000320 | consumed tokens: 55296655360 | elapsed time per iteration (s): 0.43 | learning rate: 2.322E-05 | global batch size: 256 | lm loss: 2.220984E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.824 | TFLOPs: 31.21 | 7: iteration 105480/ 115203 | consumed samples: 27002880 | consumed tokens: 55301898240 | elapsed time per iteration (s): 0.43 | learning rate: 2.321E-05 | global batch size: 256 | lm loss: 2.217878E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.208 | TFLOPs: 31.39 | 7: iteration 105490/ 115203 | consumed samples: 27005440 | consumed tokens: 55307141120 | elapsed time per iteration (s): 0.44 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 2.224025E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.183 | TFLOPs: 30.44 | 7: iteration 105500/ 115203 | consumed samples: 27008000 | consumed tokens: 55312384000 | elapsed time per iteration (s): 0.43 | learning rate: 2.320E-05 | global batch size: 256 | lm loss: 2.220380E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.574 | TFLOPs: 31.09 | 7: iteration 105510/ 115203 | consumed samples: 27010560 | consumed tokens: 55317626880 | elapsed time per iteration (s): 0.43 | learning rate: 2.319E-05 | global batch size: 256 | lm loss: 2.197232E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.141 | TFLOPs: 31.33 | 7: iteration 105520/ 115203 | consumed samples: 27013120 | consumed tokens: 55322869760 | elapsed time per iteration (s): 0.44 | learning rate: 2.318E-05 | global batch size: 256 | lm loss: 2.240563E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.926 | TFLOPs: 30.80 | 7: iteration 105530/ 115203 | consumed samples: 27015680 | consumed tokens: 55328112640 | elapsed time per iteration (s): 0.42 | learning rate: 2.318E-05 | global batch size: 256 | lm loss: 2.194633E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.295 | TFLOPs: 31.81 | 7: iteration 105540/ 115203 | consumed samples: 27018240 | consumed tokens: 55333355520 | elapsed time per iteration (s): 0.43 | learning rate: 2.317E-05 | global batch size: 256 | lm loss: 2.240164E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.262 | TFLOPs: 31.34 | 7: iteration 105550/ 115203 | consumed samples: 27020800 | consumed tokens: 55338598400 | elapsed time per iteration (s): 0.42 | learning rate: 2.316E-05 | global batch size: 256 | lm loss: 2.199858E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.824 | TFLOPs: 31.94 | 7: iteration 105560/ 115203 | consumed samples: 27023360 | consumed tokens: 55343841280 | elapsed time per iteration (s): 0.42 | learning rate: 2.316E-05 | global batch size: 256 | lm loss: 2.220518E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.009 | TFLOPs: 31.85 | 7: iteration 105570/ 115203 | consumed samples: 27025920 | consumed tokens: 55349084160 | elapsed time per iteration (s): 0.43 | learning rate: 2.315E-05 | global batch size: 256 | lm loss: 2.228228E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.795 | TFLOPs: 31.16 | 7: iteration 105580/ 115203 | consumed samples: 27028480 | consumed tokens: 55354327040 | elapsed time per iteration (s): 0.43 | learning rate: 2.314E-05 | global batch size: 256 | lm loss: 2.249657E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.974 | TFLOPs: 31.58 | 7: iteration 105590/ 115203 | consumed samples: 27031040 | consumed tokens: 55359569920 | elapsed time per iteration (s): 0.44 | learning rate: 2.314E-05 | global batch size: 256 | lm loss: 2.235951E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.371 | TFLOPs: 30.61 | 7: iteration 105600/ 115203 | consumed samples: 27033600 | consumed tokens: 55364812800 | elapsed time per iteration (s): 0.43 | learning rate: 2.313E-05 | global batch size: 256 | lm loss: 2.202610E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.963 | TFLOPs: 31.48 | 7: iteration 105610/ 115203 | consumed samples: 27036160 | consumed tokens: 55370055680 | elapsed time per iteration (s): 0.42 | learning rate: 2.312E-05 | global batch size: 256 | lm loss: 2.240320E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.938 | TFLOPs: 31.74 | 7: iteration 105620/ 115203 | consumed samples: 27038720 | consumed tokens: 55375298560 | elapsed time per iteration (s): 0.43 | learning rate: 2.312E-05 | global batch size: 256 | lm loss: 2.222254E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.968 | TFLOPs: 31.58 | 7: iteration 105630/ 115203 | consumed samples: 27041280 | consumed tokens: 55380541440 | elapsed time per iteration (s): 0.44 | learning rate: 2.311E-05 | global batch size: 256 | lm loss: 2.217485E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.008 | TFLOPs: 30.59 | 7: iteration 105640/ 115203 | consumed samples: 27043840 | consumed tokens: 55385784320 | elapsed time per iteration (s): 0.42 | learning rate: 2.310E-05 | global batch size: 256 | lm loss: 2.220209E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.873 | TFLOPs: 31.68 | 7: iteration 105650/ 115203 | consumed samples: 27046400 | consumed tokens: 55391027200 | elapsed time per iteration (s): 0.43 | learning rate: 2.310E-05 | global batch size: 256 | lm loss: 2.218255E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.464 | TFLOPs: 31.56 | 7: iteration 105660/ 115203 | consumed samples: 27048960 | consumed tokens: 55396270080 | elapsed time per iteration (s): 0.43 | learning rate: 2.309E-05 | global batch size: 256 | lm loss: 2.215739E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.182 | TFLOPs: 31.60 | 7: iteration 105670/ 115203 | consumed samples: 27051520 | consumed tokens: 55401512960 | elapsed time per iteration (s): 0.43 | learning rate: 2.309E-05 | global batch size: 256 | lm loss: 2.248296E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.455 | TFLOPs: 31.56 | 7: iteration 105680/ 115203 | consumed samples: 27054080 | consumed tokens: 55406755840 | elapsed time per iteration (s): 0.43 | learning rate: 2.308E-05 | global batch size: 256 | lm loss: 2.224417E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.750 | TFLOPs: 31.47 | 7: iteration 105690/ 115203 | consumed samples: 27056640 | consumed tokens: 55411998720 | elapsed time per iteration (s): 0.43 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 2.217387E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.675 | TFLOPs: 30.99 | 7: iteration 105700/ 115203 | consumed samples: 27059200 | consumed tokens: 55417241600 | elapsed time per iteration (s): 0.43 | learning rate: 2.307E-05 | global batch size: 256 | lm loss: 2.213289E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.591 | TFLOPs: 31.51 | 7: iteration 105710/ 115203 | consumed samples: 27061760 | consumed tokens: 55422484480 | elapsed time per iteration (s): 0.43 | learning rate: 2.306E-05 | global batch size: 256 | lm loss: 2.229376E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.365 | TFLOPs: 31.40 | 7: iteration 105720/ 115203 | consumed samples: 27064320 | consumed tokens: 55427727360 | elapsed time per iteration (s): 0.44 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 2.218254E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.940 | TFLOPs: 30.80 | 7: iteration 105730/ 115203 | consumed samples: 27066880 | consumed tokens: 55432970240 | elapsed time per iteration (s): 0.42 | learning rate: 2.305E-05 | global batch size: 256 | lm loss: 2.203534E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.685 | TFLOPs: 31.62 | 7: iteration 105740/ 115203 | consumed samples: 27069440 | consumed tokens: 55438213120 | elapsed time per iteration (s): 0.43 | learning rate: 2.304E-05 | global batch size: 256 | lm loss: 2.196863E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.203 | TFLOPs: 31.54 | 7: iteration 105750/ 115203 | consumed samples: 27072000 | consumed tokens: 55443456000 | elapsed time per iteration (s): 0.43 | learning rate: 2.303E-05 | global batch size: 256 | lm loss: 2.236565E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.889 | TFLOPs: 31.48 | 7: iteration 105760/ 115203 | consumed samples: 27074560 | consumed tokens: 55448698880 | elapsed time per iteration (s): 0.44 | learning rate: 2.303E-05 | global batch size: 256 | lm loss: 2.200336E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.851 | TFLOPs: 30.58 | 7: iteration 105770/ 115203 | consumed samples: 27077120 | consumed tokens: 55453941760 | elapsed time per iteration (s): 0.44 | learning rate: 2.302E-05 | global batch size: 256 | lm loss: 2.217917E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.335 | TFLOPs: 30.45 | 7: iteration 105780/ 115203 | consumed samples: 27079680 | consumed tokens: 55459184640 | elapsed time per iteration (s): 0.42 | learning rate: 2.302E-05 | global batch size: 256 | lm loss: 2.214083E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.258 | TFLOPs: 31.97 | 7: iteration 105790/ 115203 | consumed samples: 27082240 | consumed tokens: 55464427520 | elapsed time per iteration (s): 0.44 | learning rate: 2.301E-05 | global batch size: 256 | lm loss: 2.220478E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.107 | TFLOPs: 30.70 | 7: iteration 105800/ 115203 | consumed samples: 27084800 | consumed tokens: 55469670400 | elapsed time per iteration (s): 0.46 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 2.202076E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.133 | TFLOPs: 29.28 | 7: iteration 105810/ 115203 | consumed samples: 27087360 | consumed tokens: 55474913280 | elapsed time per iteration (s): 0.43 | learning rate: 2.300E-05 | global batch size: 256 | lm loss: 2.210345E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.162 | TFLOPs: 31.28 | 7: iteration 105820/ 115203 | consumed samples: 27089920 | consumed tokens: 55480156160 | elapsed time per iteration (s): 0.43 | learning rate: 2.299E-05 | global batch size: 256 | lm loss: 2.224737E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.622 | TFLOPs: 31.04 | 7: iteration 105830/ 115203 | consumed samples: 27092480 | consumed tokens: 55485399040 | elapsed time per iteration (s): 0.44 | learning rate: 2.298E-05 | global batch size: 256 | lm loss: 2.203614E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.728 | TFLOPs: 30.63 | 7: iteration 105840/ 115203 | consumed samples: 27095040 | consumed tokens: 55490641920 | elapsed time per iteration (s): 0.43 | learning rate: 2.298E-05 | global batch size: 256 | lm loss: 2.231416E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.668 | TFLOPs: 31.04 | 7: iteration 105850/ 115203 | consumed samples: 27097600 | consumed tokens: 55495884800 | elapsed time per iteration (s): 0.44 | learning rate: 2.297E-05 | global batch size: 256 | lm loss: 2.209571E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.601 | TFLOPs: 30.67 | 7: iteration 105860/ 115203 | consumed samples: 27100160 | consumed tokens: 55501127680 | elapsed time per iteration (s): 0.45 | learning rate: 2.296E-05 | global batch size: 256 | lm loss: 2.200031E+00 | grad norm: 0.295 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.330 | TFLOPs: 30.03 | 7: iteration 105870/ 115203 | consumed samples: 27102720 | consumed tokens: 55506370560 | elapsed time per iteration (s): 0.43 | learning rate: 2.296E-05 | global batch size: 256 | lm loss: 2.214774E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.938 | TFLOPs: 31.53 | 7: iteration 105880/ 115203 | consumed samples: 27105280 | consumed tokens: 55511613440 | elapsed time per iteration (s): 0.43 | learning rate: 2.295E-05 | global batch size: 256 | lm loss: 2.224287E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.529 | TFLOPs: 30.93 | 7: iteration 105890/ 115203 | consumed samples: 27107840 | consumed tokens: 55516856320 | elapsed time per iteration (s): 0.42 | learning rate: 2.295E-05 | global batch size: 256 | lm loss: 2.213113E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.809 | TFLOPs: 31.73 | 7: iteration 105900/ 115203 | consumed samples: 27110400 | consumed tokens: 55522099200 | elapsed time per iteration (s): 0.43 | learning rate: 2.294E-05 | global batch size: 256 | lm loss: 2.220421E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.976 | TFLOPs: 31.53 | 7: iteration 105910/ 115203 | consumed samples: 27112960 | consumed tokens: 55527342080 | elapsed time per iteration (s): 0.43 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 2.214437E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.001 | TFLOPs: 31.43 | 7: iteration 105920/ 115203 | consumed samples: 27115520 | consumed tokens: 55532584960 | elapsed time per iteration (s): 0.43 | learning rate: 2.293E-05 | global batch size: 256 | lm loss: 2.208000E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.542 | TFLOPs: 31.04 | 7: iteration 105930/ 115203 | consumed samples: 27118080 | consumed tokens: 55537827840 | elapsed time per iteration (s): 0.45 | learning rate: 2.292E-05 | global batch size: 256 | lm loss: 2.203141E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.463 | TFLOPs: 29.98 | 7: iteration 105940/ 115203 | consumed samples: 27120640 | consumed tokens: 55543070720 | elapsed time per iteration (s): 0.43 | learning rate: 2.291E-05 | global batch size: 256 | lm loss: 2.220819E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.917 | TFLOPs: 31.37 | 7: iteration 105950/ 115203 | consumed samples: 27123200 | consumed tokens: 55548313600 | elapsed time per iteration (s): 0.65 | learning rate: 2.291E-05 | global batch size: 256 | lm loss: 2.239139E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 395.185 | TFLOPs: 20.73 | 7: iteration 105960/ 115203 | consumed samples: 27125760 | consumed tokens: 55553556480 | elapsed time per iteration (s): 0.42 | learning rate: 2.290E-05 | global batch size: 256 | lm loss: 2.189040E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.261 | TFLOPs: 31.65 | 7: iteration 105970/ 115203 | consumed samples: 27128320 | consumed tokens: 55558799360 | elapsed time per iteration (s): 0.43 | learning rate: 2.290E-05 | global batch size: 256 | lm loss: 2.236285E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.536 | TFLOPs: 30.93 | 7: iteration 105980/ 115203 | consumed samples: 27130880 | consumed tokens: 55564042240 | elapsed time per iteration (s): 0.43 | learning rate: 2.289E-05 | global batch size: 256 | lm loss: 2.218593E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.661 | TFLOPs: 31.57 | 7: iteration 105990/ 115203 | consumed samples: 27133440 | consumed tokens: 55569285120 | elapsed time per iteration (s): 0.43 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 2.203804E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.751 | TFLOPs: 31.15 | 0: [2022-11-29 01:44:38,997] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=0, lr=[2.2876870847544666e-05, 2.2876870847544666e-05, 2.2876870847544666e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 0: steps: 106000 loss: 2.1788 iter time (s): 0.430 samples/sec: 595.618 7: iteration 106000/ 115203 | consumed samples: 27136000 | consumed tokens: 55574528000 | elapsed time per iteration (s): 0.45 | learning rate: 2.288E-05 | global batch size: 256 | lm loss: 2.215113E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.393 | TFLOPs: 29.77 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 106000 | lm loss value: 2.113560E+00 | lm loss PPL: 8.277655E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 106000 to checkpoints_221m 0: [2022-11-29 01:44:39,201] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step106000 is begin to save! 0: [2022-11-29 01:44:39,208] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:44:39,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:44:39,319] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:44:39,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:44:39,350] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:44:39,374] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:44:39,374] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:44:39,398] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:44:39,398] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:44:39,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:44:39,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:44:39,447] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:44:39,447] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:44:39,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:44:39,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:44:39,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:44:39,494] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:44:39,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:44:39,519] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:44:39,543] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:44:39,543] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:44:39,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:44:39,567] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:44:39,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:44:39,591] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:44:39,615] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:44:39,615] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:44:39,639] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:44:39,639] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:44:39,663] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:44:39,663] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:44:39,690] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:44:39,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:44:39,711] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:44:39,712] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:44:39,735] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:44:39,735] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:44:39,758] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:44:39,759] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:44:39,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:44:39,763] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step106000/mp_rank_00_model_states.pt 0: [2022-11-29 01:44:39,763] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:44:39,766] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:44:39,784] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:44:39,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,837] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,837] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,840] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,840] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,840] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,844] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,844] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,845] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,845] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,846] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,846] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,846] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,847] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,847] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,848] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,848] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,849] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,849] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,849] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 2: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 4: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 4: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,854] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:44:39,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 7: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,855] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,855] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,834] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,834] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,850] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,834] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,841] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,850] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,841] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,841] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,851] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,851] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:44:39,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,853] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,852] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,852] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 5: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 3: [2022-11-29 01:44:39,853] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:44:39,854] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:44:39,854] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:44:39,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,863] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,863] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,874] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,874] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 6: [2022-11-29 01:44:39,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:44:39,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:44:39,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 1: [2022-11-29 01:44:39,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:44:39,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:44:39,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: [2022-11-29 01:44:39,907] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step106000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:44:39,907] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step106000 is ready now! 0: successfully saved checkpoint at iteration 106000 to checkpoints_221m 7: time (ms) | save-checkpoint: 731.16 7: iteration 106010/ 115203 | consumed samples: 27138560 | consumed tokens: 55579770880 | elapsed time per iteration (s): 0.54 | learning rate: 2.287E-05 | global batch size: 256 | lm loss: 2.206255E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 473.169 | TFLOPs: 24.83 | 7: iteration 106020/ 115203 | consumed samples: 27141120 | consumed tokens: 55585013760 | elapsed time per iteration (s): 0.43 | learning rate: 2.286E-05 | global batch size: 256 | lm loss: 2.194412E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.274 | TFLOPs: 31.18 | 7: iteration 106030/ 115203 | consumed samples: 27143680 | consumed tokens: 55590256640 | elapsed time per iteration (s): 0.43 | learning rate: 2.286E-05 | global batch size: 256 | lm loss: 2.205646E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.005 | TFLOPs: 31.32 | 7: iteration 106040/ 115203 | consumed samples: 27146240 | consumed tokens: 55595499520 | elapsed time per iteration (s): 0.44 | learning rate: 2.285E-05 | global batch size: 256 | lm loss: 2.217321E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.479 | TFLOPs: 30.82 | 7: iteration 106050/ 115203 | consumed samples: 27148800 | consumed tokens: 55600742400 | elapsed time per iteration (s): 0.44 | learning rate: 2.285E-05 | global batch size: 256 | lm loss: 2.219662E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.279 | TFLOPs: 30.50 | 7: iteration 106060/ 115203 | consumed samples: 27151360 | consumed tokens: 55605985280 | elapsed time per iteration (s): 0.43 | learning rate: 2.284E-05 | global batch size: 256 | lm loss: 2.214035E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.950 | TFLOPs: 31.27 | 7: iteration 106070/ 115203 | consumed samples: 27153920 | consumed tokens: 55611228160 | elapsed time per iteration (s): 0.43 | learning rate: 2.283E-05 | global batch size: 256 | lm loss: 2.219135E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.568 | TFLOPs: 31.20 | 7: iteration 106080/ 115203 | consumed samples: 27156480 | consumed tokens: 55616471040 | elapsed time per iteration (s): 0.44 | learning rate: 2.283E-05 | global batch size: 256 | lm loss: 2.209777E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.239 | TFLOPs: 30.76 | 7: iteration 106090/ 115203 | consumed samples: 27159040 | consumed tokens: 55621713920 | elapsed time per iteration (s): 0.44 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 2.212768E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.872 | TFLOPs: 30.69 | 7: iteration 106100/ 115203 | consumed samples: 27161600 | consumed tokens: 55626956800 | elapsed time per iteration (s): 0.43 | learning rate: 2.282E-05 | global batch size: 256 | lm loss: 2.216719E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.817 | TFLOPs: 31.00 | 7: iteration 106110/ 115203 | consumed samples: 27164160 | consumed tokens: 55632199680 | elapsed time per iteration (s): 0.44 | learning rate: 2.281E-05 | global batch size: 256 | lm loss: 2.206443E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.683 | TFLOPs: 30.57 | 7: iteration 106120/ 115203 | consumed samples: 27166720 | consumed tokens: 55637442560 | elapsed time per iteration (s): 0.43 | learning rate: 2.280E-05 | global batch size: 256 | lm loss: 2.221699E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.067 | TFLOPs: 31.22 | 7: iteration 106130/ 115203 | consumed samples: 27169280 | consumed tokens: 55642685440 | elapsed time per iteration (s): 0.43 | learning rate: 2.280E-05 | global batch size: 256 | lm loss: 2.221575E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.169 | TFLOPs: 30.91 | 7: iteration 106140/ 115203 | consumed samples: 27171840 | consumed tokens: 55647928320 | elapsed time per iteration (s): 0.45 | learning rate: 2.279E-05 | global batch size: 256 | lm loss: 2.210380E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.391 | TFLOPs: 30.08 | 7: iteration 106150/ 115203 | consumed samples: 27174400 | consumed tokens: 55653171200 | elapsed time per iteration (s): 0.43 | learning rate: 2.278E-05 | global batch size: 256 | lm loss: 2.203579E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.872 | TFLOPs: 31.32 | 7: iteration 106160/ 115203 | consumed samples: 27176960 | consumed tokens: 55658414080 | elapsed time per iteration (s): 0.43 | learning rate: 2.278E-05 | global batch size: 256 | lm loss: 2.246152E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.440 | TFLOPs: 30.98 | 7: iteration 106170/ 115203 | consumed samples: 27179520 | consumed tokens: 55663656960 | elapsed time per iteration (s): 0.43 | learning rate: 2.277E-05 | global batch size: 256 | lm loss: 2.229017E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.157 | TFLOPs: 31.38 | 7: iteration 106180/ 115203 | consumed samples: 27182080 | consumed tokens: 55668899840 | elapsed time per iteration (s): 0.43 | learning rate: 2.277E-05 | global batch size: 256 | lm loss: 2.221222E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.908 | TFLOPs: 31.16 | 7: iteration 106190/ 115203 | consumed samples: 27184640 | consumed tokens: 55674142720 | elapsed time per iteration (s): 0.43 | learning rate: 2.276E-05 | global batch size: 256 | lm loss: 2.229802E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.157 | TFLOPs: 31.17 | 7: iteration 106200/ 115203 | consumed samples: 27187200 | consumed tokens: 55679385600 | elapsed time per iteration (s): 0.44 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 2.215870E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.810 | TFLOPs: 30.42 | 7: iteration 106210/ 115203 | consumed samples: 27189760 | consumed tokens: 55684628480 | elapsed time per iteration (s): 0.44 | learning rate: 2.275E-05 | global batch size: 256 | lm loss: 2.223726E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.150 | TFLOPs: 30.81 | 7: iteration 106220/ 115203 | consumed samples: 27192320 | consumed tokens: 55689871360 | elapsed time per iteration (s): 0.43 | learning rate: 2.274E-05 | global batch size: 256 | lm loss: 2.204262E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.891 | TFLOPs: 31.42 | 7: iteration 106230/ 115203 | consumed samples: 27194880 | consumed tokens: 55695114240 | elapsed time per iteration (s): 0.43 | learning rate: 2.274E-05 | global batch size: 256 | lm loss: 2.228971E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.757 | TFLOPs: 31.05 | 7: iteration 106240/ 115203 | consumed samples: 27197440 | consumed tokens: 55700357120 | elapsed time per iteration (s): 0.43 | learning rate: 2.273E-05 | global batch size: 256 | lm loss: 2.213525E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.959 | TFLOPs: 31.27 | 7: iteration 106250/ 115203 | consumed samples: 27200000 | consumed tokens: 55705600000 | elapsed time per iteration (s): 0.45 | learning rate: 2.272E-05 | global batch size: 256 | lm loss: 2.203695E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.096 | TFLOPs: 30.12 | 7: iteration 106260/ 115203 | consumed samples: 27202560 | consumed tokens: 55710842880 | elapsed time per iteration (s): 0.43 | learning rate: 2.272E-05 | global batch size: 256 | lm loss: 2.229356E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.783 | TFLOPs: 31.05 | 7: iteration 106270/ 115203 | consumed samples: 27205120 | consumed tokens: 55716085760 | elapsed time per iteration (s): 0.46 | learning rate: 2.271E-05 | global batch size: 256 | lm loss: 2.227660E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.776 | TFLOPs: 29.21 | 7: iteration 106280/ 115203 | consumed samples: 27207680 | consumed tokens: 55721328640 | elapsed time per iteration (s): 0.42 | learning rate: 2.271E-05 | global batch size: 256 | lm loss: 2.224440E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.327 | TFLOPs: 31.66 | 7: iteration 106290/ 115203 | consumed samples: 27210240 | consumed tokens: 55726571520 | elapsed time per iteration (s): 0.42 | learning rate: 2.270E-05 | global batch size: 256 | lm loss: 2.215265E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.223 | TFLOPs: 32.07 | 7: iteration 106300/ 115203 | consumed samples: 27212800 | consumed tokens: 55731814400 | elapsed time per iteration (s): 0.43 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 2.240204E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.177 | TFLOPs: 31.02 | 7: iteration 106310/ 115203 | consumed samples: 27215360 | consumed tokens: 55737057280 | elapsed time per iteration (s): 0.46 | learning rate: 2.269E-05 | global batch size: 256 | lm loss: 2.225248E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.673 | TFLOPs: 29.10 | 7: iteration 106320/ 115203 | consumed samples: 27217920 | consumed tokens: 55742300160 | elapsed time per iteration (s): 0.43 | learning rate: 2.268E-05 | global batch size: 256 | lm loss: 2.243585E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.272 | TFLOPs: 31.18 | 7: iteration 106330/ 115203 | consumed samples: 27220480 | consumed tokens: 55747543040 | elapsed time per iteration (s): 0.43 | learning rate: 2.268E-05 | global batch size: 256 | lm loss: 2.203681E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.518 | TFLOPs: 31.51 | 7: iteration 106340/ 115203 | consumed samples: 27223040 | consumed tokens: 55752785920 | elapsed time per iteration (s): 0.43 | learning rate: 2.267E-05 | global batch size: 256 | lm loss: 2.197226E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.353 | TFLOPs: 31.08 | 7: iteration 106350/ 115203 | consumed samples: 27225600 | consumed tokens: 55758028800 | elapsed time per iteration (s): 0.44 | learning rate: 2.266E-05 | global batch size: 256 | lm loss: 2.237699E+00 | grad norm: 0.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.826 | TFLOPs: 30.53 | 7: iteration 106360/ 115203 | consumed samples: 27228160 | consumed tokens: 55763271680 | elapsed time per iteration (s): 0.43 | learning rate: 2.266E-05 | global batch size: 256 | lm loss: 2.210459E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.846 | TFLOPs: 31.21 | 7: iteration 106370/ 115203 | consumed samples: 27230720 | consumed tokens: 55768514560 | elapsed time per iteration (s): 0.43 | learning rate: 2.265E-05 | global batch size: 256 | lm loss: 2.198129E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.289 | TFLOPs: 31.13 | 7: iteration 106380/ 115203 | consumed samples: 27233280 | consumed tokens: 55773757440 | elapsed time per iteration (s): 0.43 | learning rate: 2.265E-05 | global batch size: 256 | lm loss: 2.219679E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.501 | TFLOPs: 31.56 | 7: iteration 106390/ 115203 | consumed samples: 27235840 | consumed tokens: 55779000320 | elapsed time per iteration (s): 0.43 | learning rate: 2.264E-05 | global batch size: 256 | lm loss: 2.219007E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.882 | TFLOPs: 31.58 | 7: iteration 106400/ 115203 | consumed samples: 27238400 | consumed tokens: 55784243200 | elapsed time per iteration (s): 0.42 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 2.245923E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.351 | TFLOPs: 31.87 | 7: iteration 106410/ 115203 | consumed samples: 27240960 | consumed tokens: 55789486080 | elapsed time per iteration (s): 0.44 | learning rate: 2.263E-05 | global batch size: 256 | lm loss: 2.208946E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.051 | TFLOPs: 30.80 | 7: iteration 106420/ 115203 | consumed samples: 27243520 | consumed tokens: 55794728960 | elapsed time per iteration (s): 0.42 | learning rate: 2.262E-05 | global batch size: 256 | lm loss: 2.221320E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.585 | TFLOPs: 31.88 | 7: iteration 106430/ 115203 | consumed samples: 27246080 | consumed tokens: 55799971840 | elapsed time per iteration (s): 0.42 | learning rate: 2.262E-05 | global batch size: 256 | lm loss: 2.196687E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.828 | TFLOPs: 31.73 | 7: iteration 106440/ 115203 | consumed samples: 27248640 | consumed tokens: 55805214720 | elapsed time per iteration (s): 0.43 | learning rate: 2.261E-05 | global batch size: 256 | lm loss: 2.174199E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.418 | TFLOPs: 31.56 | 7: iteration 106450/ 115203 | consumed samples: 27251200 | consumed tokens: 55810457600 | elapsed time per iteration (s): 0.43 | learning rate: 2.260E-05 | global batch size: 256 | lm loss: 2.219003E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.282 | TFLOPs: 31.13 | 7: iteration 106460/ 115203 | consumed samples: 27253760 | consumed tokens: 55815700480 | elapsed time per iteration (s): 0.44 | learning rate: 2.260E-05 | global batch size: 256 | lm loss: 2.193628E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.981 | TFLOPs: 30.85 | 7: iteration 106470/ 115203 | consumed samples: 27256320 | consumed tokens: 55820943360 | elapsed time per iteration (s): 0.43 | learning rate: 2.259E-05 | global batch size: 256 | lm loss: 2.208029E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.413 | TFLOPs: 31.45 | 7: iteration 106480/ 115203 | consumed samples: 27258880 | consumed tokens: 55826186240 | elapsed time per iteration (s): 0.42 | learning rate: 2.259E-05 | global batch size: 256 | lm loss: 2.220749E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.860 | TFLOPs: 31.84 | 7: iteration 106490/ 115203 | consumed samples: 27261440 | consumed tokens: 55831429120 | elapsed time per iteration (s): 0.44 | learning rate: 2.258E-05 | global batch size: 256 | lm loss: 2.228105E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.352 | TFLOPs: 30.19 | 7: iteration 106500/ 115203 | consumed samples: 27264000 | consumed tokens: 55836672000 | elapsed time per iteration (s): 0.43 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 2.232151E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.408 | TFLOPs: 31.45 | 7: iteration 106510/ 115203 | consumed samples: 27266560 | consumed tokens: 55841914880 | elapsed time per iteration (s): 0.43 | learning rate: 2.257E-05 | global batch size: 256 | lm loss: 2.231725E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.796 | TFLOPs: 31.00 | 7: iteration 106520/ 115203 | consumed samples: 27269120 | consumed tokens: 55847157760 | elapsed time per iteration (s): 0.43 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 2.231583E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.788 | TFLOPs: 31.21 | 7: iteration 106530/ 115203 | consumed samples: 27271680 | consumed tokens: 55852400640 | elapsed time per iteration (s): 0.43 | learning rate: 2.256E-05 | global batch size: 256 | lm loss: 2.195431E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.139 | TFLOPs: 31.07 | 7: iteration 106540/ 115203 | consumed samples: 27274240 | consumed tokens: 55857643520 | elapsed time per iteration (s): 0.43 | learning rate: 2.255E-05 | global batch size: 256 | lm loss: 2.198907E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.858 | TFLOPs: 31.53 | 7: iteration 106550/ 115203 | consumed samples: 27276800 | consumed tokens: 55862886400 | elapsed time per iteration (s): 0.44 | learning rate: 2.254E-05 | global batch size: 256 | lm loss: 2.231190E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.737 | TFLOPs: 30.68 | 7: iteration 106560/ 115203 | consumed samples: 27279360 | consumed tokens: 55868129280 | elapsed time per iteration (s): 0.43 | learning rate: 2.254E-05 | global batch size: 256 | lm loss: 2.217693E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.005 | TFLOPs: 30.96 | 7: iteration 106570/ 115203 | consumed samples: 27281920 | consumed tokens: 55873372160 | elapsed time per iteration (s): 0.45 | learning rate: 2.253E-05 | global batch size: 256 | lm loss: 2.217215E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.725 | TFLOPs: 29.95 | 7: iteration 106580/ 115203 | consumed samples: 27284480 | consumed tokens: 55878615040 | elapsed time per iteration (s): 0.47 | learning rate: 2.253E-05 | global batch size: 256 | lm loss: 2.252353E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.237 | TFLOPs: 28.82 | 7: iteration 106590/ 115203 | consumed samples: 27287040 | consumed tokens: 55883857920 | elapsed time per iteration (s): 0.44 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 2.209710E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.213 | TFLOPs: 30.23 | 7: iteration 106600/ 115203 | consumed samples: 27289600 | consumed tokens: 55889100800 | elapsed time per iteration (s): 0.42 | learning rate: 2.252E-05 | global batch size: 256 | lm loss: 2.198773E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.151 | TFLOPs: 32.07 | 7: iteration 106610/ 115203 | consumed samples: 27292160 | consumed tokens: 55894343680 | elapsed time per iteration (s): 0.43 | learning rate: 2.251E-05 | global batch size: 256 | lm loss: 2.234785E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.501 | TFLOPs: 31.45 | 7: iteration 106620/ 115203 | consumed samples: 27294720 | consumed tokens: 55899586560 | elapsed time per iteration (s): 0.43 | learning rate: 2.250E-05 | global batch size: 256 | lm loss: 2.208445E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.077 | TFLOPs: 31.01 | 7: iteration 106630/ 115203 | consumed samples: 27297280 | consumed tokens: 55904829440 | elapsed time per iteration (s): 0.45 | learning rate: 2.250E-05 | global batch size: 256 | lm loss: 2.211033E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.289 | TFLOPs: 29.55 | 7: iteration 106640/ 115203 | consumed samples: 27299840 | consumed tokens: 55910072320 | elapsed time per iteration (s): 0.42 | learning rate: 2.249E-05 | global batch size: 256 | lm loss: 2.217657E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.516 | TFLOPs: 31.82 | 7: iteration 106650/ 115203 | consumed samples: 27302400 | consumed tokens: 55915315200 | elapsed time per iteration (s): 0.43 | learning rate: 2.249E-05 | global batch size: 256 | lm loss: 2.200014E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.676 | TFLOPs: 31.15 | 7: iteration 106660/ 115203 | consumed samples: 27304960 | consumed tokens: 55920558080 | elapsed time per iteration (s): 0.43 | learning rate: 2.248E-05 | global batch size: 256 | lm loss: 2.239586E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.276 | TFLOPs: 31.55 | 7: iteration 106670/ 115203 | consumed samples: 27307520 | consumed tokens: 55925800960 | elapsed time per iteration (s): 0.45 | learning rate: 2.248E-05 | global batch size: 256 | lm loss: 2.183018E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.399 | TFLOPs: 30.09 | 7: iteration 106680/ 115203 | consumed samples: 27310080 | consumed tokens: 55931043840 | elapsed time per iteration (s): 0.44 | learning rate: 2.247E-05 | global batch size: 256 | lm loss: 2.202867E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.214 | TFLOPs: 30.81 | 7: iteration 106690/ 115203 | consumed samples: 27312640 | consumed tokens: 55936286720 | elapsed time per iteration (s): 0.43 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 2.225037E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.791 | TFLOPs: 31.26 | 7: iteration 106700/ 115203 | consumed samples: 27315200 | consumed tokens: 55941529600 | elapsed time per iteration (s): 0.45 | learning rate: 2.246E-05 | global batch size: 256 | lm loss: 2.217816E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.004 | TFLOPs: 30.01 | 7: iteration 106710/ 115203 | consumed samples: 27317760 | consumed tokens: 55946772480 | elapsed time per iteration (s): 0.69 | learning rate: 2.245E-05 | global batch size: 256 | lm loss: 2.192286E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 370.082 | TFLOPs: 19.42 | 7: iteration 106720/ 115203 | consumed samples: 27320320 | consumed tokens: 55952015360 | elapsed time per iteration (s): 0.42 | learning rate: 2.245E-05 | global batch size: 256 | lm loss: 2.246316E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.787 | TFLOPs: 31.84 | 7: iteration 106730/ 115203 | consumed samples: 27322880 | consumed tokens: 55957258240 | elapsed time per iteration (s): 0.43 | learning rate: 2.244E-05 | global batch size: 256 | lm loss: 2.210067E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.320 | TFLOPs: 31.55 | 7: iteration 106740/ 115203 | consumed samples: 27325440 | consumed tokens: 55962501120 | elapsed time per iteration (s): 0.42 | learning rate: 2.243E-05 | global batch size: 256 | lm loss: 2.222948E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.514 | TFLOPs: 31.82 | 7: iteration 106750/ 115203 | consumed samples: 27328000 | consumed tokens: 55967744000 | elapsed time per iteration (s): 0.43 | learning rate: 2.243E-05 | global batch size: 256 | lm loss: 2.230247E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.696 | TFLOPs: 31.31 | 7: iteration 106760/ 115203 | consumed samples: 27330560 | consumed tokens: 55972986880 | elapsed time per iteration (s): 0.42 | learning rate: 2.242E-05 | global batch size: 256 | lm loss: 2.220821E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.756 | TFLOPs: 31.68 | 7: iteration 106770/ 115203 | consumed samples: 27333120 | consumed tokens: 55978229760 | elapsed time per iteration (s): 0.43 | learning rate: 2.242E-05 | global batch size: 256 | lm loss: 2.209211E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.798 | TFLOPs: 31.37 | 7: iteration 106780/ 115203 | consumed samples: 27335680 | consumed tokens: 55983472640 | elapsed time per iteration (s): 0.42 | learning rate: 2.241E-05 | global batch size: 256 | lm loss: 2.239145E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.768 | TFLOPs: 31.84 | 7: iteration 106790/ 115203 | consumed samples: 27338240 | consumed tokens: 55988715520 | elapsed time per iteration (s): 0.42 | learning rate: 2.241E-05 | global batch size: 256 | lm loss: 2.196629E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.461 | TFLOPs: 31.82 | 7: iteration 106800/ 115203 | consumed samples: 27340800 | consumed tokens: 55993958400 | elapsed time per iteration (s): 0.43 | learning rate: 2.240E-05 | global batch size: 256 | lm loss: 2.203554E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.784 | TFLOPs: 31.00 | 7: iteration 106810/ 115203 | consumed samples: 27343360 | consumed tokens: 55999201280 | elapsed time per iteration (s): 0.43 | learning rate: 2.239E-05 | global batch size: 256 | lm loss: 2.223174E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.820 | TFLOPs: 31.31 | 7: iteration 106820/ 115203 | consumed samples: 27345920 | consumed tokens: 56004444160 | elapsed time per iteration (s): 0.42 | learning rate: 2.239E-05 | global batch size: 256 | lm loss: 2.256336E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.093 | TFLOPs: 32.06 | 7: iteration 106830/ 115203 | consumed samples: 27348480 | consumed tokens: 56009687040 | elapsed time per iteration (s): 0.43 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 2.239391E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.721 | TFLOPs: 31.41 | 7: iteration 106840/ 115203 | consumed samples: 27351040 | consumed tokens: 56014929920 | elapsed time per iteration (s): 0.42 | learning rate: 2.238E-05 | global batch size: 256 | lm loss: 2.211903E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.778 | TFLOPs: 31.73 | 7: iteration 106850/ 115203 | consumed samples: 27353600 | consumed tokens: 56020172800 | elapsed time per iteration (s): 0.44 | learning rate: 2.237E-05 | global batch size: 256 | lm loss: 2.255876E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.425 | TFLOPs: 30.66 | 7: iteration 106860/ 115203 | consumed samples: 27356160 | consumed tokens: 56025415680 | elapsed time per iteration (s): 0.43 | learning rate: 2.237E-05 | global batch size: 256 | lm loss: 2.213726E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.141 | TFLOPs: 31.02 | 7: iteration 106870/ 115203 | consumed samples: 27358720 | consumed tokens: 56030658560 | elapsed time per iteration (s): 0.43 | learning rate: 2.236E-05 | global batch size: 256 | lm loss: 2.233701E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.087 | TFLOPs: 31.43 | 7: iteration 106880/ 115203 | consumed samples: 27361280 | consumed tokens: 56035901440 | elapsed time per iteration (s): 0.43 | learning rate: 2.236E-05 | global batch size: 256 | lm loss: 2.198405E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.463 | TFLOPs: 31.56 | 7: iteration 106890/ 115203 | consumed samples: 27363840 | consumed tokens: 56041144320 | elapsed time per iteration (s): 0.43 | learning rate: 2.235E-05 | global batch size: 256 | lm loss: 2.190894E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.323 | TFLOPs: 31.60 | 7: iteration 106900/ 115203 | consumed samples: 27366400 | consumed tokens: 56046387200 | elapsed time per iteration (s): 0.43 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 2.207829E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.810 | TFLOPs: 31.37 | 7: iteration 106910/ 115203 | consumed samples: 27368960 | consumed tokens: 56051630080 | elapsed time per iteration (s): 0.42 | learning rate: 2.234E-05 | global batch size: 256 | lm loss: 2.193546E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.684 | TFLOPs: 31.78 | 7: iteration 106920/ 115203 | consumed samples: 27371520 | consumed tokens: 56056872960 | elapsed time per iteration (s): 0.42 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 2.194304E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.520 | TFLOPs: 31.67 | 7: iteration 106930/ 115203 | consumed samples: 27374080 | consumed tokens: 56062115840 | elapsed time per iteration (s): 0.69 | learning rate: 2.233E-05 | global batch size: 256 | lm loss: 2.220722E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 371.277 | TFLOPs: 19.48 | 7: iteration 106940/ 115203 | consumed samples: 27376640 | consumed tokens: 56067358720 | elapsed time per iteration (s): 0.43 | learning rate: 2.232E-05 | global batch size: 256 | lm loss: 2.258530E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.045 | TFLOPs: 31.27 | 7: iteration 106950/ 115203 | consumed samples: 27379200 | consumed tokens: 56072601600 | elapsed time per iteration (s): 0.42 | learning rate: 2.232E-05 | global batch size: 256 | lm loss: 2.204335E+00 | grad norm: 0.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.928 | TFLOPs: 31.74 | 7: iteration 106960/ 115203 | consumed samples: 27381760 | consumed tokens: 56077844480 | elapsed time per iteration (s): 0.42 | learning rate: 2.231E-05 | global batch size: 256 | lm loss: 2.253627E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.544 | TFLOPs: 31.61 | 7: iteration 106970/ 115203 | consumed samples: 27384320 | consumed tokens: 56083087360 | elapsed time per iteration (s): 0.43 | learning rate: 2.230E-05 | global batch size: 256 | lm loss: 2.205648E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.565 | TFLOPs: 31.09 | 7: iteration 106980/ 115203 | consumed samples: 27386880 | consumed tokens: 56088330240 | elapsed time per iteration (s): 0.43 | learning rate: 2.230E-05 | global batch size: 256 | lm loss: 2.184548E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.937 | TFLOPs: 31.53 | 7: iteration 106990/ 115203 | consumed samples: 27389440 | consumed tokens: 56093573120 | elapsed time per iteration (s): 0.75 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 2.236605E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 342.761 | TFLOPs: 17.98 | 7: iteration 107000/ 115203 | consumed samples: 27392000 | consumed tokens: 56098816000 | elapsed time per iteration (s): 0.42 | learning rate: 2.229E-05 | global batch size: 256 | lm loss: 2.231834E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.109 | TFLOPs: 31.70 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 107000 | lm loss value: 2.155006E+00 | lm loss PPL: 8.627945E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 107000 to checkpoints_221m 0: [2022-11-29 01:52:00,291] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step107000 is begin to save! 0: [2022-11-29 01:52:00,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:52:00,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:52:00,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:52:00,445] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:52:00,445] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:52:00,470] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:52:00,470] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:52:00,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:52:00,496] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:52:00,520] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:52:00,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:52:00,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:52:00,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:52:00,571] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:52:00,571] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:52:00,596] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:52:00,596] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:52:00,621] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:52:00,621] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:52:00,646] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:52:00,647] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:52:00,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:52:00,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:52:00,697] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:52:00,698] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:52:00,721] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:52:00,721] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:52:00,745] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:52:00,745] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:52:00,768] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:52:00,769] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:52:00,793] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:52:00,794] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:52:00,818] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:52:00,818] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:52:00,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:52:00,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:52:00,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:52:00,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:52:00,874] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:52:00,875] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step107000/mp_rank_00_model_states.pt 0: [2022-11-29 01:52:00,875] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:52:00,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:52:00,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,896] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:52:00,897] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:52:00,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,951] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,952] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,952] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,952] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,954] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,958] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,958] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,953] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,954] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,954] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,960] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,960] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,960] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,961] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,961] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,961] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 5: [2022-11-29 01:52:00,962] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:52:00,962] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 01:52:00,962] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,963] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:52:00,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,963] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:00,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:00,963] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,966] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:52:00,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,967] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 6: [2022-11-29 01:52:00,967] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,979] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,979] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:52:00,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,982] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 7: [2022-11-29 01:52:00,982] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:00,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:00,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:00,994] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:00,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:00,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:00,995] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:00,995] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:52:01,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:01,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:01,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:01,005] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 3: [2022-11-29 01:52:01,005] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: [2022-11-29 01:52:01,015] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:52:01,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 1: [2022-11-29 01:52:01,017] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,095] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 2: [2022-11-29 01:52:01,097] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:52:01,097] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:52:01,097] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,099] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,099] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,099] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,100] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,100] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,101] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,101] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:52:01,109] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step107000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 01:52:01,109] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step107000 is ready now! 0: successfully saved checkpoint at iteration 107000 to checkpoints_221m 7: time (ms) | save-checkpoint: 839.50 7: iteration 107010/ 115203 | consumed samples: 27394560 | consumed tokens: 56104058880 | elapsed time per iteration (s): 0.53 | learning rate: 2.228E-05 | global batch size: 256 | lm loss: 2.187989E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 482.212 | TFLOPs: 25.30 | 7: iteration 107020/ 115203 | consumed samples: 27397120 | consumed tokens: 56109301760 | elapsed time per iteration (s): 0.42 | learning rate: 2.228E-05 | global batch size: 256 | lm loss: 2.241654E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.938 | TFLOPs: 31.69 | 7: iteration 107030/ 115203 | consumed samples: 27399680 | consumed tokens: 56114544640 | elapsed time per iteration (s): 0.44 | learning rate: 2.227E-05 | global batch size: 256 | lm loss: 2.228240E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.346 | TFLOPs: 30.76 | 7: iteration 107040/ 115203 | consumed samples: 27402240 | consumed tokens: 56119787520 | elapsed time per iteration (s): 0.43 | learning rate: 2.227E-05 | global batch size: 256 | lm loss: 2.210069E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.840 | TFLOPs: 31.47 | 7: iteration 107050/ 115203 | consumed samples: 27404800 | consumed tokens: 56125030400 | elapsed time per iteration (s): 0.42 | learning rate: 2.226E-05 | global batch size: 256 | lm loss: 2.203578E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.680 | TFLOPs: 31.67 | 7: iteration 107060/ 115203 | consumed samples: 27407360 | consumed tokens: 56130273280 | elapsed time per iteration (s): 0.42 | learning rate: 2.225E-05 | global batch size: 256 | lm loss: 2.236378E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.881 | TFLOPs: 31.74 | 7: iteration 107070/ 115203 | consumed samples: 27409920 | consumed tokens: 56135516160 | elapsed time per iteration (s): 0.43 | learning rate: 2.225E-05 | global batch size: 256 | lm loss: 2.197634E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.802 | TFLOPs: 31.47 | 7: iteration 107080/ 115203 | consumed samples: 27412480 | consumed tokens: 56140759040 | elapsed time per iteration (s): 0.43 | learning rate: 2.224E-05 | global batch size: 256 | lm loss: 2.230196E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.534 | TFLOPs: 31.14 | 7: iteration 107090/ 115203 | consumed samples: 27415040 | consumed tokens: 56146001920 | elapsed time per iteration (s): 0.43 | learning rate: 2.224E-05 | global batch size: 256 | lm loss: 2.220337E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.422 | TFLOPs: 31.14 | 7: iteration 107100/ 115203 | consumed samples: 27417600 | consumed tokens: 56151244800 | elapsed time per iteration (s): 0.42 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 2.252729E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.937 | TFLOPs: 32.05 | 7: iteration 107110/ 115203 | consumed samples: 27420160 | consumed tokens: 56156487680 | elapsed time per iteration (s): 0.43 | learning rate: 2.223E-05 | global batch size: 256 | lm loss: 2.236204E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.861 | TFLOPs: 31.37 | 7: iteration 107120/ 115203 | consumed samples: 27422720 | consumed tokens: 56161730560 | elapsed time per iteration (s): 0.43 | learning rate: 2.222E-05 | global batch size: 256 | lm loss: 2.225886E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.855 | TFLOPs: 31.21 | 7: iteration 107130/ 115203 | consumed samples: 27425280 | consumed tokens: 56166973440 | elapsed time per iteration (s): 0.42 | learning rate: 2.222E-05 | global batch size: 256 | lm loss: 2.192824E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.603 | TFLOPs: 31.67 | 7: iteration 107140/ 115203 | consumed samples: 27427840 | consumed tokens: 56172216320 | elapsed time per iteration (s): 0.42 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 2.223363E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.627 | TFLOPs: 31.62 | 7: iteration 107150/ 115203 | consumed samples: 27430400 | consumed tokens: 56177459200 | elapsed time per iteration (s): 0.42 | learning rate: 2.221E-05 | global batch size: 256 | lm loss: 2.217672E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.416 | TFLOPs: 32.29 | 7: iteration 107160/ 115203 | consumed samples: 27432960 | consumed tokens: 56182702080 | elapsed time per iteration (s): 0.45 | learning rate: 2.220E-05 | global batch size: 256 | lm loss: 2.197229E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.175 | TFLOPs: 29.81 | 7: iteration 107170/ 115203 | consumed samples: 27435520 | consumed tokens: 56187944960 | elapsed time per iteration (s): 0.42 | learning rate: 2.219E-05 | global batch size: 256 | lm loss: 2.221174E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.484 | TFLOPs: 31.72 | 7: iteration 107180/ 115203 | consumed samples: 27438080 | consumed tokens: 56193187840 | elapsed time per iteration (s): 0.44 | learning rate: 2.219E-05 | global batch size: 256 | lm loss: 2.191915E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.956 | TFLOPs: 30.38 | 7: iteration 107190/ 115203 | consumed samples: 27440640 | consumed tokens: 56198430720 | elapsed time per iteration (s): 0.42 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 2.251896E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.564 | TFLOPs: 31.72 | 7: iteration 107200/ 115203 | consumed samples: 27443200 | consumed tokens: 56203673600 | elapsed time per iteration (s): 0.42 | learning rate: 2.218E-05 | global batch size: 256 | lm loss: 2.208177E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.763 | TFLOPs: 31.63 | 7: iteration 107210/ 115203 | consumed samples: 27445760 | consumed tokens: 56208916480 | elapsed time per iteration (s): 0.43 | learning rate: 2.217E-05 | global batch size: 256 | lm loss: 2.213214E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.598 | TFLOPs: 31.46 | 7: iteration 107220/ 115203 | consumed samples: 27448320 | consumed tokens: 56214159360 | elapsed time per iteration (s): 0.43 | learning rate: 2.217E-05 | global batch size: 256 | lm loss: 2.229161E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.313 | TFLOPs: 31.03 | 7: iteration 107230/ 115203 | consumed samples: 27450880 | consumed tokens: 56219402240 | elapsed time per iteration (s): 0.42 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 2.207706E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.535 | TFLOPs: 31.77 | 7: iteration 107240/ 115203 | consumed samples: 27453440 | consumed tokens: 56224645120 | elapsed time per iteration (s): 0.42 | learning rate: 2.216E-05 | global batch size: 256 | lm loss: 2.228010E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.839 | TFLOPs: 31.63 | 7: iteration 107250/ 115203 | consumed samples: 27456000 | consumed tokens: 56229888000 | elapsed time per iteration (s): 0.43 | learning rate: 2.215E-05 | global batch size: 256 | lm loss: 2.222950E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.531 | TFLOPs: 31.46 | 7: iteration 107260/ 115203 | consumed samples: 27458560 | consumed tokens: 56235130880 | elapsed time per iteration (s): 0.43 | learning rate: 2.215E-05 | global batch size: 256 | lm loss: 2.217172E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.127 | TFLOPs: 31.07 | 7: iteration 107270/ 115203 | consumed samples: 27461120 | consumed tokens: 56240373760 | elapsed time per iteration (s): 0.43 | learning rate: 2.214E-05 | global batch size: 256 | lm loss: 2.215601E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.042 | TFLOPs: 31.38 | 7: iteration 107280/ 115203 | consumed samples: 27463680 | consumed tokens: 56245616640 | elapsed time per iteration (s): 0.42 | learning rate: 2.214E-05 | global batch size: 256 | lm loss: 2.190446E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.931 | TFLOPs: 31.74 | 7: iteration 107290/ 115203 | consumed samples: 27466240 | consumed tokens: 56250859520 | elapsed time per iteration (s): 0.42 | learning rate: 2.213E-05 | global batch size: 256 | lm loss: 2.178915E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.513 | TFLOPs: 31.82 | 7: iteration 107300/ 115203 | consumed samples: 27468800 | consumed tokens: 56256102400 | elapsed time per iteration (s): 0.98 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 2.203854E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 261.113 | TFLOPs: 13.70 | 7: iteration 107310/ 115203 | consumed samples: 27471360 | consumed tokens: 56261345280 | elapsed time per iteration (s): 0.67 | learning rate: 2.212E-05 | global batch size: 256 | lm loss: 2.192982E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 380.297 | TFLOPs: 19.95 | 7: iteration 107320/ 115203 | consumed samples: 27473920 | consumed tokens: 56266588160 | elapsed time per iteration (s): 0.94 | learning rate: 2.211E-05 | global batch size: 256 | lm loss: 2.222121E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 271.139 | TFLOPs: 14.23 | 7: iteration 107330/ 115203 | consumed samples: 27476480 | consumed tokens: 56271831040 | elapsed time per iteration (s): 0.44 | learning rate: 2.211E-05 | global batch size: 256 | lm loss: 2.215970E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.144 | TFLOPs: 30.44 | 7: iteration 107340/ 115203 | consumed samples: 27479040 | consumed tokens: 56277073920 | elapsed time per iteration (s): 0.46 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 2.221475E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 552.751 | TFLOPs: 29.00 | 7: iteration 107350/ 115203 | consumed samples: 27481600 | consumed tokens: 56282316800 | elapsed time per iteration (s): 0.45 | learning rate: 2.210E-05 | global batch size: 256 | lm loss: 2.233242E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.787 | TFLOPs: 29.74 | 7: iteration 107360/ 115203 | consumed samples: 27484160 | consumed tokens: 56287559680 | elapsed time per iteration (s): 0.44 | learning rate: 2.209E-05 | global batch size: 256 | lm loss: 2.219184E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.628 | TFLOPs: 30.78 | 7: iteration 107370/ 115203 | consumed samples: 27486720 | consumed tokens: 56292802560 | elapsed time per iteration (s): 0.43 | learning rate: 2.209E-05 | global batch size: 256 | lm loss: 2.217559E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.493 | TFLOPs: 31.51 | 7: iteration 107380/ 115203 | consumed samples: 27489280 | consumed tokens: 56298045440 | elapsed time per iteration (s): 0.44 | learning rate: 2.208E-05 | global batch size: 256 | lm loss: 2.215966E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.080 | TFLOPs: 30.75 | 7: iteration 107390/ 115203 | consumed samples: 27491840 | consumed tokens: 56303288320 | elapsed time per iteration (s): 0.44 | learning rate: 2.208E-05 | global batch size: 256 | lm loss: 2.224734E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.459 | TFLOPs: 30.88 | 7: iteration 107400/ 115203 | consumed samples: 27494400 | consumed tokens: 56308531200 | elapsed time per iteration (s): 0.44 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 2.255893E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.754 | TFLOPs: 30.79 | 7: iteration 107410/ 115203 | consumed samples: 27496960 | consumed tokens: 56313774080 | elapsed time per iteration (s): 0.44 | learning rate: 2.207E-05 | global batch size: 256 | lm loss: 2.206765E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.600 | TFLOPs: 30.25 | 7: iteration 107420/ 115203 | consumed samples: 27499520 | consumed tokens: 56319016960 | elapsed time per iteration (s): 0.44 | learning rate: 2.206E-05 | global batch size: 256 | lm loss: 2.217153E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.447 | TFLOPs: 30.66 | 7: iteration 107430/ 115203 | consumed samples: 27502080 | consumed tokens: 56324259840 | elapsed time per iteration (s): 0.44 | learning rate: 2.206E-05 | global batch size: 256 | lm loss: 2.263896E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.628 | TFLOPs: 30.52 | 7: iteration 107440/ 115203 | consumed samples: 27504640 | consumed tokens: 56329502720 | elapsed time per iteration (s): 0.44 | learning rate: 2.205E-05 | global batch size: 256 | lm loss: 2.211201E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.668 | TFLOPs: 30.68 | 7: iteration 107450/ 115203 | consumed samples: 27507200 | consumed tokens: 56334745600 | elapsed time per iteration (s): 0.43 | learning rate: 2.204E-05 | global batch size: 256 | lm loss: 2.237905E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.200 | TFLOPs: 30.91 | 7: iteration 107460/ 115203 | consumed samples: 27509760 | consumed tokens: 56339988480 | elapsed time per iteration (s): 0.43 | learning rate: 2.204E-05 | global batch size: 256 | lm loss: 2.203514E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.666 | TFLOPs: 31.04 | 7: iteration 107470/ 115203 | consumed samples: 27512320 | consumed tokens: 56345231360 | elapsed time per iteration (s): 0.45 | learning rate: 2.203E-05 | global batch size: 256 | lm loss: 2.215002E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.150 | TFLOPs: 30.07 | 7: iteration 107480/ 115203 | consumed samples: 27514880 | consumed tokens: 56350474240 | elapsed time per iteration (s): 0.43 | learning rate: 2.203E-05 | global batch size: 256 | lm loss: 2.240444E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.569 | TFLOPs: 31.25 | 7: iteration 107490/ 115203 | consumed samples: 27517440 | consumed tokens: 56355717120 | elapsed time per iteration (s): 0.43 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 2.197973E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.960 | TFLOPs: 31.32 | 7: iteration 107500/ 115203 | consumed samples: 27520000 | consumed tokens: 56360960000 | elapsed time per iteration (s): 0.43 | learning rate: 2.202E-05 | global batch size: 256 | lm loss: 2.224309E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.829 | TFLOPs: 31.05 | 7: iteration 107510/ 115203 | consumed samples: 27522560 | consumed tokens: 56366202880 | elapsed time per iteration (s): 0.44 | learning rate: 2.201E-05 | global batch size: 256 | lm loss: 2.199759E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.700 | TFLOPs: 30.57 | 7: iteration 107520/ 115203 | consumed samples: 27525120 | consumed tokens: 56371445760 | elapsed time per iteration (s): 0.42 | learning rate: 2.201E-05 | global batch size: 256 | lm loss: 2.225853E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.992 | TFLOPs: 31.64 | 7: iteration 107530/ 115203 | consumed samples: 27527680 | consumed tokens: 56376688640 | elapsed time per iteration (s): 0.44 | learning rate: 2.200E-05 | global batch size: 256 | lm loss: 2.206729E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.558 | TFLOPs: 30.36 | 7: iteration 107540/ 115203 | consumed samples: 27530240 | consumed tokens: 56381931520 | elapsed time per iteration (s): 0.43 | learning rate: 2.200E-05 | global batch size: 256 | lm loss: 2.220095E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.586 | TFLOPs: 31.51 | 7: iteration 107550/ 115203 | consumed samples: 27532800 | consumed tokens: 56387174400 | elapsed time per iteration (s): 0.44 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 2.227677E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.097 | TFLOPs: 30.70 | 7: iteration 107560/ 115203 | consumed samples: 27535360 | consumed tokens: 56392417280 | elapsed time per iteration (s): 0.43 | learning rate: 2.199E-05 | global batch size: 256 | lm loss: 2.225456E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.874 | TFLOPs: 31.16 | 7: iteration 107570/ 115203 | consumed samples: 27537920 | consumed tokens: 56397660160 | elapsed time per iteration (s): 0.43 | learning rate: 2.198E-05 | global batch size: 256 | lm loss: 2.227482E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.571 | TFLOPs: 30.99 | 7: iteration 107580/ 115203 | consumed samples: 27540480 | consumed tokens: 56402903040 | elapsed time per iteration (s): 0.43 | learning rate: 2.198E-05 | global batch size: 256 | lm loss: 2.209993E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.685 | TFLOPs: 31.20 | 7: iteration 107590/ 115203 | consumed samples: 27543040 | consumed tokens: 56408145920 | elapsed time per iteration (s): 0.43 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 2.208069E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.007 | TFLOPs: 30.90 | 7: iteration 107600/ 115203 | consumed samples: 27545600 | consumed tokens: 56413388800 | elapsed time per iteration (s): 0.43 | learning rate: 2.197E-05 | global batch size: 256 | lm loss: 2.218752E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.360 | TFLOPs: 31.50 | 7: iteration 107610/ 115203 | consumed samples: 27548160 | consumed tokens: 56418631680 | elapsed time per iteration (s): 0.44 | learning rate: 2.196E-05 | global batch size: 256 | lm loss: 2.226642E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.936 | TFLOPs: 30.85 | 7: iteration 107620/ 115203 | consumed samples: 27550720 | consumed tokens: 56423874560 | elapsed time per iteration (s): 0.44 | learning rate: 2.196E-05 | global batch size: 256 | lm loss: 2.242700E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.844 | TFLOPs: 30.58 | 7: iteration 107630/ 115203 | consumed samples: 27553280 | consumed tokens: 56429117440 | elapsed time per iteration (s): 0.43 | learning rate: 2.195E-05 | global batch size: 256 | lm loss: 2.190526E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.904 | TFLOPs: 31.00 | 7: iteration 107640/ 115203 | consumed samples: 27555840 | consumed tokens: 56434360320 | elapsed time per iteration (s): 0.43 | learning rate: 2.195E-05 | global batch size: 256 | lm loss: 2.243229E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.083 | TFLOPs: 31.59 | 7: iteration 107650/ 115203 | consumed samples: 27558400 | consumed tokens: 56439603200 | elapsed time per iteration (s): 0.43 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 2.211209E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.580 | TFLOPs: 30.88 | 7: iteration 107660/ 115203 | consumed samples: 27560960 | consumed tokens: 56444846080 | elapsed time per iteration (s): 0.43 | learning rate: 2.194E-05 | global batch size: 256 | lm loss: 2.218523E+00 | grad norm: 0.234 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.706 | TFLOPs: 31.41 | 7: iteration 107670/ 115203 | consumed samples: 27563520 | consumed tokens: 56450088960 | elapsed time per iteration (s): 0.43 | learning rate: 2.193E-05 | global batch size: 256 | lm loss: 2.230939E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.892 | TFLOPs: 30.95 | 7: iteration 107680/ 115203 | consumed samples: 27566080 | consumed tokens: 56455331840 | elapsed time per iteration (s): 0.43 | learning rate: 2.193E-05 | global batch size: 256 | lm loss: 2.190237E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.754 | TFLOPs: 31.21 | 7: iteration 107690/ 115203 | consumed samples: 27568640 | consumed tokens: 56460574720 | elapsed time per iteration (s): 0.59 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 2.254846E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 430.278 | TFLOPs: 22.58 | 7: iteration 107700/ 115203 | consumed samples: 27571200 | consumed tokens: 56465817600 | elapsed time per iteration (s): 0.43 | learning rate: 2.192E-05 | global batch size: 256 | lm loss: 2.224976E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.498 | TFLOPs: 31.45 | 7: iteration 107710/ 115203 | consumed samples: 27573760 | consumed tokens: 56471060480 | elapsed time per iteration (s): 0.44 | learning rate: 2.191E-05 | global batch size: 256 | lm loss: 2.202248E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.261 | TFLOPs: 30.39 | 7: iteration 107720/ 115203 | consumed samples: 27576320 | consumed tokens: 56476303360 | elapsed time per iteration (s): 0.43 | learning rate: 2.191E-05 | global batch size: 256 | lm loss: 2.186734E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.665 | TFLOPs: 31.20 | 7: iteration 107730/ 115203 | consumed samples: 27578880 | consumed tokens: 56481546240 | elapsed time per iteration (s): 0.44 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 2.217715E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.706 | TFLOPs: 30.42 | 7: iteration 107740/ 115203 | consumed samples: 27581440 | consumed tokens: 56486789120 | elapsed time per iteration (s): 0.43 | learning rate: 2.190E-05 | global batch size: 256 | lm loss: 2.221940E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.416 | TFLOPs: 30.98 | 7: iteration 107750/ 115203 | consumed samples: 27584000 | consumed tokens: 56492032000 | elapsed time per iteration (s): 0.45 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 2.226727E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.689 | TFLOPs: 30.00 | 7: iteration 107760/ 115203 | consumed samples: 27586560 | consumed tokens: 56497274880 | elapsed time per iteration (s): 0.43 | learning rate: 2.189E-05 | global batch size: 256 | lm loss: 2.234778E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.036 | TFLOPs: 31.12 | 7: iteration 107770/ 115203 | consumed samples: 27589120 | consumed tokens: 56502517760 | elapsed time per iteration (s): 0.44 | learning rate: 2.188E-05 | global batch size: 256 | lm loss: 2.237928E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.062 | TFLOPs: 30.80 | 7: iteration 107780/ 115203 | consumed samples: 27591680 | consumed tokens: 56507760640 | elapsed time per iteration (s): 0.43 | learning rate: 2.188E-05 | global batch size: 256 | lm loss: 2.213531E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.747 | TFLOPs: 31.36 | 7: iteration 107790/ 115203 | consumed samples: 27594240 | consumed tokens: 56513003520 | elapsed time per iteration (s): 0.43 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 2.234334E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.137 | TFLOPs: 31.49 | 7: iteration 107800/ 115203 | consumed samples: 27596800 | consumed tokens: 56518246400 | elapsed time per iteration (s): 0.43 | learning rate: 2.187E-05 | global batch size: 256 | lm loss: 2.229456E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.440 | TFLOPs: 31.08 | 7: iteration 107810/ 115203 | consumed samples: 27599360 | consumed tokens: 56523489280 | elapsed time per iteration (s): 0.44 | learning rate: 2.186E-05 | global batch size: 256 | lm loss: 2.226318E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.497 | TFLOPs: 30.35 | 7: iteration 107820/ 115203 | consumed samples: 27601920 | consumed tokens: 56528732160 | elapsed time per iteration (s): 0.44 | learning rate: 2.186E-05 | global batch size: 256 | lm loss: 2.217986E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.325 | TFLOPs: 30.19 | 7: iteration 107830/ 115203 | consumed samples: 27604480 | consumed tokens: 56533975040 | elapsed time per iteration (s): 0.44 | learning rate: 2.185E-05 | global batch size: 256 | lm loss: 2.202343E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.914 | TFLOPs: 30.32 | 7: iteration 107840/ 115203 | consumed samples: 27607040 | consumed tokens: 56539217920 | elapsed time per iteration (s): 0.45 | learning rate: 2.185E-05 | global batch size: 256 | lm loss: 2.233346E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.428 | TFLOPs: 29.88 | 7: iteration 107850/ 115203 | consumed samples: 27609600 | consumed tokens: 56544460800 | elapsed time per iteration (s): 0.43 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 2.229451E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.726 | TFLOPs: 31.15 | 7: iteration 107860/ 115203 | consumed samples: 27612160 | consumed tokens: 56549703680 | elapsed time per iteration (s): 0.43 | learning rate: 2.184E-05 | global batch size: 256 | lm loss: 2.168637E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.339 | TFLOPs: 31.34 | 7: iteration 107870/ 115203 | consumed samples: 27614720 | consumed tokens: 56554946560 | elapsed time per iteration (s): 0.43 | learning rate: 2.183E-05 | global batch size: 256 | lm loss: 2.216666E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.049 | TFLOPs: 31.48 | 7: iteration 107880/ 115203 | consumed samples: 27617280 | consumed tokens: 56560189440 | elapsed time per iteration (s): 0.43 | learning rate: 2.183E-05 | global batch size: 256 | lm loss: 2.216710E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.261 | TFLOPs: 31.28 | 7: iteration 107890/ 115203 | consumed samples: 27619840 | consumed tokens: 56565432320 | elapsed time per iteration (s): 0.44 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 2.219127E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.332 | TFLOPs: 30.55 | 7: iteration 107900/ 115203 | consumed samples: 27622400 | consumed tokens: 56570675200 | elapsed time per iteration (s): 0.43 | learning rate: 2.182E-05 | global batch size: 256 | lm loss: 2.227323E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.689 | TFLOPs: 31.10 | 7: iteration 107910/ 115203 | consumed samples: 27624960 | consumed tokens: 56575918080 | elapsed time per iteration (s): 0.44 | learning rate: 2.181E-05 | global batch size: 256 | lm loss: 2.224084E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.741 | TFLOPs: 30.31 | 7: iteration 107920/ 115203 | consumed samples: 27627520 | consumed tokens: 56581160960 | elapsed time per iteration (s): 0.43 | learning rate: 2.181E-05 | global batch size: 256 | lm loss: 2.210156E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.615 | TFLOPs: 30.99 | 7: iteration 107930/ 115203 | consumed samples: 27630080 | consumed tokens: 56586403840 | elapsed time per iteration (s): 0.43 | learning rate: 2.180E-05 | global batch size: 256 | lm loss: 2.209963E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.494 | TFLOPs: 31.09 | 7: iteration 107940/ 115203 | consumed samples: 27632640 | consumed tokens: 56591646720 | elapsed time per iteration (s): 0.44 | learning rate: 2.180E-05 | global batch size: 256 | lm loss: 2.192162E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.192 | TFLOPs: 30.81 | 7: iteration 107950/ 115203 | consumed samples: 27635200 | consumed tokens: 56596889600 | elapsed time per iteration (s): 0.46 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 2.212784E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 558.349 | TFLOPs: 29.30 | 7: iteration 107960/ 115203 | consumed samples: 27637760 | consumed tokens: 56602132480 | elapsed time per iteration (s): 0.44 | learning rate: 2.179E-05 | global batch size: 256 | lm loss: 2.204500E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.109 | TFLOPs: 30.54 | 7: iteration 107970/ 115203 | consumed samples: 27640320 | consumed tokens: 56607375360 | elapsed time per iteration (s): 0.43 | learning rate: 2.178E-05 | global batch size: 256 | lm loss: 2.230000E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.353 | TFLOPs: 31.08 | 7: iteration 107980/ 115203 | consumed samples: 27642880 | consumed tokens: 56612618240 | elapsed time per iteration (s): 0.44 | learning rate: 2.178E-05 | global batch size: 256 | lm loss: 2.174620E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.120 | TFLOPs: 30.39 | 7: iteration 107990/ 115203 | consumed samples: 27645440 | consumed tokens: 56617861120 | elapsed time per iteration (s): 0.46 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 2.196553E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.099 | TFLOPs: 29.39 | 0: [2022-11-29 01:59:29,459] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=0, lr=[2.176608969325893e-05, 2.176608969325893e-05, 2.176608969325893e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 108000/ 115203 | consumed samples: 27648000 | consumed tokens: 56623104000 | elapsed time per iteration (s): 0.44 | learning rate: 2.177E-05 | global batch size: 256 | lm loss: 2.238010E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.904 | TFLOPs: 30.37 | 0: steps: 108000 loss: 2.3038 iter time (s): 0.443 samples/sec: 578.468 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 108000 | lm loss value: 2.176000E+00 | lm loss PPL: 8.810991E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 108000 to checkpoints_221m 0: [2022-11-29 01:59:29,651] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step108000 is begin to save! 0: [2022-11-29 01:59:29,677] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_01-model_00-model_states.pt... 0: [2022-11-29 01:59:29,791] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_01-model_00-model_states.pt. 0: [2022-11-29 01:59:29,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_03-model_00-model_states.pt... 0: [2022-11-29 01:59:29,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_03-model_00-model_states.pt. 0: [2022-11-29 01:59:29,814] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_04-model_00-model_states.pt... 0: [2022-11-29 01:59:29,838] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_04-model_00-model_states.pt. 0: [2022-11-29 01:59:29,839] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_05-model_00-model_states.pt... 0: [2022-11-29 01:59:29,863] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_05-model_00-model_states.pt. 0: [2022-11-29 01:59:29,863] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_06-model_00-model_states.pt... 0: [2022-11-29 01:59:29,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_06-model_00-model_states.pt. 0: [2022-11-29 01:59:29,887] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_07-model_00-model_states.pt... 0: [2022-11-29 01:59:29,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_07-model_00-model_states.pt. 0: [2022-11-29 01:59:29,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_08-model_00-model_states.pt... 0: [2022-11-29 01:59:29,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_08-model_00-model_states.pt. 0: [2022-11-29 01:59:29,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_09-model_00-model_states.pt... 0: [2022-11-29 01:59:29,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_09-model_00-model_states.pt. 0: [2022-11-29 01:59:29,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_10-model_00-model_states.pt... 0: [2022-11-29 01:59:29,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_10-model_00-model_states.pt. 0: [2022-11-29 01:59:29,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_11-model_00-model_states.pt... 0: [2022-11-29 01:59:30,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_11-model_00-model_states.pt. 0: [2022-11-29 01:59:30,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_12-model_00-model_states.pt... 0: [2022-11-29 01:59:30,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_12-model_00-model_states.pt. 0: [2022-11-29 01:59:30,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_13-model_00-model_states.pt... 0: [2022-11-29 01:59:30,057] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_13-model_00-model_states.pt. 0: [2022-11-29 01:59:30,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_14-model_00-model_states.pt... 0: [2022-11-29 01:59:30,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_14-model_00-model_states.pt. 0: [2022-11-29 01:59:30,081] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_15-model_00-model_states.pt... 0: [2022-11-29 01:59:30,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_15-model_00-model_states.pt. 0: [2022-11-29 01:59:30,105] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_16-model_00-model_states.pt... 0: [2022-11-29 01:59:30,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_16-model_00-model_states.pt. 0: [2022-11-29 01:59:30,129] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_17-model_00-model_states.pt... 0: [2022-11-29 01:59:30,152] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_17-model_00-model_states.pt. 0: [2022-11-29 01:59:30,153] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_18-model_00-model_states.pt... 0: [2022-11-29 01:59:30,177] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_18-model_00-model_states.pt. 0: [2022-11-29 01:59:30,177] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_19-model_00-model_states.pt... 0: [2022-11-29 01:59:30,200] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_19-model_00-model_states.pt. 0: [2022-11-29 01:59:30,201] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_20-model_00-model_states.pt... 0: [2022-11-29 01:59:30,225] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_20-model_00-model_states.pt. 0: [2022-11-29 01:59:30,225] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/layer_22-model_00-model_states.pt... 0: [2022-11-29 01:59:30,229] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/layer_22-model_00-model_states.pt. 0: [2022-11-29 01:59:30,230] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step108000/mp_rank_00_model_states.pt 0: [2022-11-29 01:59:30,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/mp_rank_00_model_states.pt... 0: [2022-11-29 01:59:30,233] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/mp_rank_00_model_states.pt. 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 4: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 01:59:30,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-29 01:59:30,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,308] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,309] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,312] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,312] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,312] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,313] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 1: [2022-11-29 01:59:30,314] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 01:59:30,314] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 01:59:30,314] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,323] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,323] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,324] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,324] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,324] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 01:59:30,325] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,325] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,326] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,326] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,326] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,327] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,327] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,327] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,310] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,311] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,311] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,318] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,316] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,318] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,316] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,318] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 5: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,319] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 5: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 3: [2022-11-29 01:59:30,319] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 01:59:30,320] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 01:59:30,320] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,329] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,329] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,329] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,330] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 01:59:30,330] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,330] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,336] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,336] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,336] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 4: [2022-11-29 01:59:30,337] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 01:59:30,337] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 01:59:30,337] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 6: [2022-11-29 01:59:30,361] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 01:59:30,361] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: [2022-11-29 01:59:30,368] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 01:59:30,368] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step108000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 7: [2022-11-29 01:59:30,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step108000 is ready now! 0: successfully saved checkpoint at iteration 108000 to checkpoints_221m 7: time (ms) | save-checkpoint: 846.94 7: iteration 108010/ 115203 | consumed samples: 27650560 | consumed tokens: 56628346880 | elapsed time per iteration (s): 0.54 | learning rate: 2.176E-05 | global batch size: 256 | lm loss: 2.225624E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 477.760 | TFLOPs: 25.07 | 7: iteration 108020/ 115203 | consumed samples: 27653120 | consumed tokens: 56633589760 | elapsed time per iteration (s): 0.43 | learning rate: 2.176E-05 | global batch size: 256 | lm loss: 2.227249E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.315 | TFLOPs: 31.18 | 7: iteration 108030/ 115203 | consumed samples: 27655680 | consumed tokens: 56638832640 | elapsed time per iteration (s): 0.46 | learning rate: 2.175E-05 | global batch size: 256 | lm loss: 2.184927E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.688 | TFLOPs: 29.37 | 7: iteration 108040/ 115203 | consumed samples: 27658240 | consumed tokens: 56644075520 | elapsed time per iteration (s): 0.44 | learning rate: 2.175E-05 | global batch size: 256 | lm loss: 2.212359E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.896 | TFLOPs: 30.27 | 7: iteration 108050/ 115203 | consumed samples: 27660800 | consumed tokens: 56649318400 | elapsed time per iteration (s): 0.42 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 2.228406E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.423 | TFLOPs: 31.61 | 7: iteration 108060/ 115203 | consumed samples: 27663360 | consumed tokens: 56654561280 | elapsed time per iteration (s): 0.43 | learning rate: 2.174E-05 | global batch size: 256 | lm loss: 2.190787E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.653 | TFLOPs: 31.46 | 7: iteration 108070/ 115203 | consumed samples: 27665920 | consumed tokens: 56659804160 | elapsed time per iteration (s): 0.43 | learning rate: 2.173E-05 | global batch size: 256 | lm loss: 2.220939E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.054 | TFLOPs: 31.43 | 7: iteration 108080/ 115203 | consumed samples: 27668480 | consumed tokens: 56665047040 | elapsed time per iteration (s): 0.45 | learning rate: 2.173E-05 | global batch size: 256 | lm loss: 2.209277E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.989 | TFLOPs: 29.59 | 7: iteration 108090/ 115203 | consumed samples: 27671040 | consumed tokens: 56670289920 | elapsed time per iteration (s): 0.42 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 2.227644E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.486 | TFLOPs: 31.93 | 7: iteration 108100/ 115203 | consumed samples: 27673600 | consumed tokens: 56675532800 | elapsed time per iteration (s): 0.43 | learning rate: 2.172E-05 | global batch size: 256 | lm loss: 2.234519E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.832 | TFLOPs: 30.90 | 7: iteration 108110/ 115203 | consumed samples: 27676160 | consumed tokens: 56680775680 | elapsed time per iteration (s): 0.43 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 2.210846E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.445 | TFLOPs: 31.29 | 7: iteration 108120/ 115203 | consumed samples: 27678720 | consumed tokens: 56686018560 | elapsed time per iteration (s): 0.43 | learning rate: 2.171E-05 | global batch size: 256 | lm loss: 2.210690E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.724 | TFLOPs: 31.36 | 7: iteration 108130/ 115203 | consumed samples: 27681280 | consumed tokens: 56691261440 | elapsed time per iteration (s): 0.43 | learning rate: 2.170E-05 | global batch size: 256 | lm loss: 2.208573E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.172 | TFLOPs: 31.12 | 7: iteration 108140/ 115203 | consumed samples: 27683840 | consumed tokens: 56696504320 | elapsed time per iteration (s): 0.43 | learning rate: 2.170E-05 | global batch size: 256 | lm loss: 2.212558E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.866 | TFLOPs: 30.95 | 7: iteration 108150/ 115203 | consumed samples: 27686400 | consumed tokens: 56701747200 | elapsed time per iteration (s): 0.45 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 2.214509E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.481 | TFLOPs: 30.14 | 7: iteration 108160/ 115203 | consumed samples: 27688960 | consumed tokens: 56706990080 | elapsed time per iteration (s): 0.43 | learning rate: 2.169E-05 | global batch size: 256 | lm loss: 2.223395E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.591 | TFLOPs: 31.41 | 7: iteration 108170/ 115203 | consumed samples: 27691520 | consumed tokens: 56712232960 | elapsed time per iteration (s): 0.43 | learning rate: 2.168E-05 | global batch size: 256 | lm loss: 2.208923E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.502 | TFLOPs: 31.19 | 7: iteration 108180/ 115203 | consumed samples: 27694080 | consumed tokens: 56717475840 | elapsed time per iteration (s): 0.43 | learning rate: 2.168E-05 | global batch size: 256 | lm loss: 2.212021E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.667 | TFLOPs: 31.20 | 7: iteration 108190/ 115203 | consumed samples: 27696640 | consumed tokens: 56722718720 | elapsed time per iteration (s): 0.43 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 2.207157E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.463 | TFLOPs: 31.45 | 7: iteration 108200/ 115203 | consumed samples: 27699200 | consumed tokens: 56727961600 | elapsed time per iteration (s): 0.43 | learning rate: 2.167E-05 | global batch size: 256 | lm loss: 2.238728E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.691 | TFLOPs: 31.31 | 7: iteration 108210/ 115203 | consumed samples: 27701760 | consumed tokens: 56733204480 | elapsed time per iteration (s): 0.43 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 2.214579E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.960 | TFLOPs: 31.01 | 7: iteration 108220/ 115203 | consumed samples: 27704320 | consumed tokens: 56738447360 | elapsed time per iteration (s): 0.43 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 2.223343E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.472 | TFLOPs: 31.14 | 7: iteration 108230/ 115203 | consumed samples: 27706880 | consumed tokens: 56743690240 | elapsed time per iteration (s): 0.43 | learning rate: 2.166E-05 | global batch size: 256 | lm loss: 2.206375E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.238 | TFLOPs: 30.97 | 7: iteration 108240/ 115203 | consumed samples: 27709440 | consumed tokens: 56748933120 | elapsed time per iteration (s): 0.44 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 2.198316E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.854 | TFLOPs: 30.79 | 7: iteration 108250/ 115203 | consumed samples: 27712000 | consumed tokens: 56754176000 | elapsed time per iteration (s): 0.44 | learning rate: 2.165E-05 | global batch size: 256 | lm loss: 2.234227E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.574 | TFLOPs: 30.46 | 7: iteration 108260/ 115203 | consumed samples: 27714560 | consumed tokens: 56759418880 | elapsed time per iteration (s): 0.44 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 2.200191E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.062 | TFLOPs: 30.75 | 7: iteration 108270/ 115203 | consumed samples: 27717120 | consumed tokens: 56764661760 | elapsed time per iteration (s): 0.45 | learning rate: 2.164E-05 | global batch size: 256 | lm loss: 2.215630E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.591 | TFLOPs: 29.99 | 7: iteration 108280/ 115203 | consumed samples: 27719680 | consumed tokens: 56769904640 | elapsed time per iteration (s): 0.43 | learning rate: 2.163E-05 | global batch size: 256 | lm loss: 2.242206E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.142 | TFLOPs: 30.96 | 7: iteration 108290/ 115203 | consumed samples: 27722240 | consumed tokens: 56775147520 | elapsed time per iteration (s): 0.43 | learning rate: 2.163E-05 | global batch size: 256 | lm loss: 2.221037E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.709 | TFLOPs: 31.26 | 7: iteration 108300/ 115203 | consumed samples: 27724800 | consumed tokens: 56780390400 | elapsed time per iteration (s): 0.44 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 2.224481E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.166 | TFLOPs: 30.60 | 7: iteration 108310/ 115203 | consumed samples: 27727360 | consumed tokens: 56785633280 | elapsed time per iteration (s): 0.44 | learning rate: 2.162E-05 | global batch size: 256 | lm loss: 2.204174E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.275 | TFLOPs: 30.87 | 7: iteration 108320/ 115203 | consumed samples: 27729920 | consumed tokens: 56790876160 | elapsed time per iteration (s): 0.44 | learning rate: 2.161E-05 | global batch size: 256 | lm loss: 2.224285E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.820 | TFLOPs: 30.84 | 7: iteration 108330/ 115203 | consumed samples: 27732480 | consumed tokens: 56796119040 | elapsed time per iteration (s): 0.43 | learning rate: 2.161E-05 | global batch size: 256 | lm loss: 2.190813E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.713 | TFLOPs: 30.94 | 7: iteration 108340/ 115203 | consumed samples: 27735040 | consumed tokens: 56801361920 | elapsed time per iteration (s): 0.44 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 2.217130E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.076 | TFLOPs: 30.49 | 7: iteration 108350/ 115203 | consumed samples: 27737600 | consumed tokens: 56806604800 | elapsed time per iteration (s): 0.44 | learning rate: 2.160E-05 | global batch size: 256 | lm loss: 2.225568E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.812 | TFLOPs: 30.74 | 7: iteration 108360/ 115203 | consumed samples: 27740160 | consumed tokens: 56811847680 | elapsed time per iteration (s): 0.43 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 2.240829E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.923 | TFLOPs: 31.42 | 7: iteration 108370/ 115203 | consumed samples: 27742720 | consumed tokens: 56817090560 | elapsed time per iteration (s): 0.43 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 2.239205E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.652 | TFLOPs: 31.15 | 7: iteration 108380/ 115203 | consumed samples: 27745280 | consumed tokens: 56822333440 | elapsed time per iteration (s): 0.43 | learning rate: 2.159E-05 | global batch size: 256 | lm loss: 2.235674E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.808 | TFLOPs: 30.89 | 7: iteration 108390/ 115203 | consumed samples: 27747840 | consumed tokens: 56827576320 | elapsed time per iteration (s): 0.44 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 2.225418E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.467 | TFLOPs: 30.40 | 7: iteration 108400/ 115203 | consumed samples: 27750400 | consumed tokens: 56832819200 | elapsed time per iteration (s): 0.43 | learning rate: 2.158E-05 | global batch size: 256 | lm loss: 2.217679E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.950 | TFLOPs: 31.16 | 7: iteration 108410/ 115203 | consumed samples: 27752960 | consumed tokens: 56838062080 | elapsed time per iteration (s): 0.43 | learning rate: 2.157E-05 | global batch size: 256 | lm loss: 2.226080E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.435 | TFLOPs: 31.35 | 7: iteration 108420/ 115203 | consumed samples: 27755520 | consumed tokens: 56843304960 | elapsed time per iteration (s): 0.43 | learning rate: 2.157E-05 | global batch size: 256 | lm loss: 2.216657E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.660 | TFLOPs: 31.20 | 7: iteration 108430/ 115203 | consumed samples: 27758080 | consumed tokens: 56848547840 | elapsed time per iteration (s): 0.43 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 2.223858E+00 | grad norm: 0.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.156 | TFLOPs: 31.23 | 7: iteration 108440/ 115203 | consumed samples: 27760640 | consumed tokens: 56853790720 | elapsed time per iteration (s): 0.44 | learning rate: 2.156E-05 | global batch size: 256 | lm loss: 2.214389E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.383 | TFLOPs: 30.71 | 7: iteration 108450/ 115203 | consumed samples: 27763200 | consumed tokens: 56859033600 | elapsed time per iteration (s): 0.44 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 2.208276E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.459 | TFLOPs: 30.88 | 7: iteration 108460/ 115203 | consumed samples: 27765760 | consumed tokens: 56864276480 | elapsed time per iteration (s): 0.43 | learning rate: 2.155E-05 | global batch size: 256 | lm loss: 2.214916E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.707 | TFLOPs: 30.89 | 7: iteration 108470/ 115203 | consumed samples: 27768320 | consumed tokens: 56869519360 | elapsed time per iteration (s): 0.43 | learning rate: 2.154E-05 | global batch size: 256 | lm loss: 2.229026E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.330 | TFLOPs: 31.34 | 7: iteration 108480/ 115203 | consumed samples: 27770880 | consumed tokens: 56874762240 | elapsed time per iteration (s): 0.43 | learning rate: 2.154E-05 | global batch size: 256 | lm loss: 2.240117E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.131 | TFLOPs: 31.44 | 7: iteration 108490/ 115203 | consumed samples: 27773440 | consumed tokens: 56880005120 | elapsed time per iteration (s): 0.44 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 2.214036E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.871 | TFLOPs: 30.69 | 7: iteration 108500/ 115203 | consumed samples: 27776000 | consumed tokens: 56885248000 | elapsed time per iteration (s): 0.44 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 2.235442E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.223 | TFLOPs: 30.60 | 7: iteration 108510/ 115203 | consumed samples: 27778560 | consumed tokens: 56890490880 | elapsed time per iteration (s): 0.43 | learning rate: 2.153E-05 | global batch size: 256 | lm loss: 2.248856E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.190 | TFLOPs: 31.18 | 7: iteration 108520/ 115203 | consumed samples: 27781120 | consumed tokens: 56895733760 | elapsed time per iteration (s): 0.42 | learning rate: 2.152E-05 | global batch size: 256 | lm loss: 2.195005E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.056 | TFLOPs: 31.64 | 7: iteration 108530/ 115203 | consumed samples: 27783680 | consumed tokens: 56900976640 | elapsed time per iteration (s): 0.43 | learning rate: 2.152E-05 | global batch size: 256 | lm loss: 2.215481E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.873 | TFLOPs: 31.11 | 7: iteration 108540/ 115203 | consumed samples: 27786240 | consumed tokens: 56906219520 | elapsed time per iteration (s): 0.43 | learning rate: 2.151E-05 | global batch size: 256 | lm loss: 2.194112E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.291 | TFLOPs: 31.13 | 7: iteration 108550/ 115203 | consumed samples: 27788800 | consumed tokens: 56911462400 | elapsed time per iteration (s): 0.44 | learning rate: 2.151E-05 | global batch size: 256 | lm loss: 2.252700E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.309 | TFLOPs: 30.55 | 7: iteration 108560/ 115203 | consumed samples: 27791360 | consumed tokens: 56916705280 | elapsed time per iteration (s): 0.42 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 2.233237E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.675 | TFLOPs: 31.62 | 7: iteration 108570/ 115203 | consumed samples: 27793920 | consumed tokens: 56921948160 | elapsed time per iteration (s): 0.44 | learning rate: 2.150E-05 | global batch size: 256 | lm loss: 2.200662E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.002 | TFLOPs: 30.48 | 7: iteration 108580/ 115203 | consumed samples: 27796480 | consumed tokens: 56927191040 | elapsed time per iteration (s): 0.44 | learning rate: 2.149E-05 | global batch size: 256 | lm loss: 2.207429E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.916 | TFLOPs: 30.69 | 7: iteration 108590/ 115203 | consumed samples: 27799040 | consumed tokens: 56932433920 | elapsed time per iteration (s): 0.43 | learning rate: 2.149E-05 | global batch size: 256 | lm loss: 2.253370E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.146 | TFLOPs: 31.38 | 7: iteration 108600/ 115203 | consumed samples: 27801600 | consumed tokens: 56937676800 | elapsed time per iteration (s): 0.43 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 2.250204E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.483 | TFLOPs: 31.19 | 7: iteration 108610/ 115203 | consumed samples: 27804160 | consumed tokens: 56942919680 | elapsed time per iteration (s): 0.43 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 2.218081E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.563 | TFLOPs: 31.51 | 7: iteration 108620/ 115203 | consumed samples: 27806720 | consumed tokens: 56948162560 | elapsed time per iteration (s): 0.42 | learning rate: 2.148E-05 | global batch size: 256 | lm loss: 2.235368E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.577 | TFLOPs: 31.62 | 7: iteration 108630/ 115203 | consumed samples: 27809280 | consumed tokens: 56953405440 | elapsed time per iteration (s): 0.43 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 2.241870E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.763 | TFLOPs: 31.00 | 7: iteration 108640/ 115203 | consumed samples: 27811840 | consumed tokens: 56958648320 | elapsed time per iteration (s): 0.44 | learning rate: 2.147E-05 | global batch size: 256 | lm loss: 2.209249E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.169 | TFLOPs: 30.76 | 7: iteration 108650/ 115203 | consumed samples: 27814400 | consumed tokens: 56963891200 | elapsed time per iteration (s): 0.44 | learning rate: 2.146E-05 | global batch size: 256 | lm loss: 2.239536E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.356 | TFLOPs: 30.82 | 7: iteration 108660/ 115203 | consumed samples: 27816960 | consumed tokens: 56969134080 | elapsed time per iteration (s): 0.43 | learning rate: 2.146E-05 | global batch size: 256 | lm loss: 2.216971E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.215 | TFLOPs: 31.44 | 7: iteration 108670/ 115203 | consumed samples: 27819520 | consumed tokens: 56974376960 | elapsed time per iteration (s): 0.43 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 2.214954E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.870 | TFLOPs: 31.26 | 7: iteration 108680/ 115203 | consumed samples: 27822080 | consumed tokens: 56979619840 | elapsed time per iteration (s): 0.43 | learning rate: 2.145E-05 | global batch size: 256 | lm loss: 2.205315E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.783 | TFLOPs: 31.00 | 7: iteration 108690/ 115203 | consumed samples: 27824640 | consumed tokens: 56984862720 | elapsed time per iteration (s): 0.43 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 2.232538E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.578 | TFLOPs: 31.25 | 7: iteration 108700/ 115203 | consumed samples: 27827200 | consumed tokens: 56990105600 | elapsed time per iteration (s): 0.44 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 2.213025E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.338 | TFLOPs: 30.61 | 7: iteration 108710/ 115203 | consumed samples: 27829760 | consumed tokens: 56995348480 | elapsed time per iteration (s): 0.43 | learning rate: 2.144E-05 | global batch size: 256 | lm loss: 2.232800E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.805 | TFLOPs: 31.52 | 7: iteration 108720/ 115203 | consumed samples: 27832320 | consumed tokens: 57000591360 | elapsed time per iteration (s): 0.43 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 2.210757E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.756 | TFLOPs: 31.42 | 7: iteration 108730/ 115203 | consumed samples: 27834880 | consumed tokens: 57005834240 | elapsed time per iteration (s): 0.44 | learning rate: 2.143E-05 | global batch size: 256 | lm loss: 2.207462E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.512 | TFLOPs: 30.46 | 7: iteration 108740/ 115203 | consumed samples: 27837440 | consumed tokens: 57011077120 | elapsed time per iteration (s): 0.44 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 2.207137E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.365 | TFLOPs: 30.71 | 7: iteration 108750/ 115203 | consumed samples: 27840000 | consumed tokens: 57016320000 | elapsed time per iteration (s): 0.43 | learning rate: 2.142E-05 | global batch size: 256 | lm loss: 2.242084E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.944 | TFLOPs: 31.16 | 7: iteration 108760/ 115203 | consumed samples: 27842560 | consumed tokens: 57021562880 | elapsed time per iteration (s): 0.43 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 2.241475E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.905 | TFLOPs: 31.27 | 7: iteration 108770/ 115203 | consumed samples: 27845120 | consumed tokens: 57026805760 | elapsed time per iteration (s): 0.44 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 2.218114E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.958 | TFLOPs: 30.64 | 7: iteration 108780/ 115203 | consumed samples: 27847680 | consumed tokens: 57032048640 | elapsed time per iteration (s): 0.43 | learning rate: 2.141E-05 | global batch size: 256 | lm loss: 2.263924E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.104 | TFLOPs: 31.07 | 7: iteration 108790/ 115203 | consumed samples: 27850240 | consumed tokens: 57037291520 | elapsed time per iteration (s): 0.44 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 2.229745E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.079 | TFLOPs: 30.70 | 7: iteration 108800/ 115203 | consumed samples: 27852800 | consumed tokens: 57042534400 | elapsed time per iteration (s): 0.43 | learning rate: 2.140E-05 | global batch size: 256 | lm loss: 2.221785E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.440 | TFLOPs: 31.29 | 7: iteration 108810/ 115203 | consumed samples: 27855360 | consumed tokens: 57047777280 | elapsed time per iteration (s): 0.43 | learning rate: 2.139E-05 | global batch size: 256 | lm loss: 2.232253E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.202 | TFLOPs: 31.07 | 7: iteration 108820/ 115203 | consumed samples: 27857920 | consumed tokens: 57053020160 | elapsed time per iteration (s): 0.43 | learning rate: 2.139E-05 | global batch size: 256 | lm loss: 2.284182E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.163 | TFLOPs: 31.28 | 7: iteration 108830/ 115203 | consumed samples: 27860480 | consumed tokens: 57058263040 | elapsed time per iteration (s): 0.43 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 2.199810E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.105 | TFLOPs: 31.01 | 7: iteration 108840/ 115203 | consumed samples: 27863040 | consumed tokens: 57063505920 | elapsed time per iteration (s): 0.43 | learning rate: 2.138E-05 | global batch size: 256 | lm loss: 2.208855E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.901 | TFLOPs: 31.11 | 7: iteration 108850/ 115203 | consumed samples: 27865600 | consumed tokens: 57068748800 | elapsed time per iteration (s): 0.44 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 2.200721E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.239 | TFLOPs: 30.50 | 7: iteration 108860/ 115203 | consumed samples: 27868160 | consumed tokens: 57073991680 | elapsed time per iteration (s): 0.43 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 2.201202E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.963 | TFLOPs: 31.01 | 7: iteration 108870/ 115203 | consumed samples: 27870720 | consumed tokens: 57079234560 | elapsed time per iteration (s): 0.43 | learning rate: 2.137E-05 | global batch size: 256 | lm loss: 2.242861E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.508 | TFLOPs: 31.09 | 7: iteration 108880/ 115203 | consumed samples: 27873280 | consumed tokens: 57084477440 | elapsed time per iteration (s): 0.44 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 2.236041E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.914 | TFLOPs: 30.43 | 7: iteration 108890/ 115203 | consumed samples: 27875840 | consumed tokens: 57089720320 | elapsed time per iteration (s): 0.43 | learning rate: 2.136E-05 | global batch size: 256 | lm loss: 2.216979E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.673 | TFLOPs: 31.25 | 7: iteration 108900/ 115203 | consumed samples: 27878400 | consumed tokens: 57094963200 | elapsed time per iteration (s): 0.43 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 2.196268E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.859 | TFLOPs: 31.05 | 7: iteration 108910/ 115203 | consumed samples: 27880960 | consumed tokens: 57100206080 | elapsed time per iteration (s): 0.43 | learning rate: 2.135E-05 | global batch size: 256 | lm loss: 2.229667E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.896 | TFLOPs: 31.16 | 7: iteration 108920/ 115203 | consumed samples: 27883520 | consumed tokens: 57105448960 | elapsed time per iteration (s): 0.43 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 2.223534E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.578 | TFLOPs: 31.09 | 7: iteration 108930/ 115203 | consumed samples: 27886080 | consumed tokens: 57110691840 | elapsed time per iteration (s): 0.46 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 2.214600E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.842 | TFLOPs: 29.37 | 7: iteration 108940/ 115203 | consumed samples: 27888640 | consumed tokens: 57115934720 | elapsed time per iteration (s): 0.43 | learning rate: 2.134E-05 | global batch size: 256 | lm loss: 2.182719E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.645 | TFLOPs: 31.25 | 7: iteration 108950/ 115203 | consumed samples: 27891200 | consumed tokens: 57121177600 | elapsed time per iteration (s): 0.44 | learning rate: 2.133E-05 | global batch size: 256 | lm loss: 2.250694E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.608 | TFLOPs: 30.67 | 7: iteration 108960/ 115203 | consumed samples: 27893760 | consumed tokens: 57126420480 | elapsed time per iteration (s): 0.43 | learning rate: 2.133E-05 | global batch size: 256 | lm loss: 2.221901E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.035 | TFLOPs: 31.43 | 7: iteration 108970/ 115203 | consumed samples: 27896320 | consumed tokens: 57131663360 | elapsed time per iteration (s): 0.44 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 2.221343E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.973 | TFLOPs: 30.38 | 7: iteration 108980/ 115203 | consumed samples: 27898880 | consumed tokens: 57136906240 | elapsed time per iteration (s): 0.43 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 2.208945E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.661 | TFLOPs: 31.04 | 7: iteration 108990/ 115203 | consumed samples: 27901440 | consumed tokens: 57142149120 | elapsed time per iteration (s): 0.45 | learning rate: 2.132E-05 | global batch size: 256 | lm loss: 2.237633E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.611 | TFLOPs: 30.10 | 7: iteration 109000/ 115203 | consumed samples: 27904000 | consumed tokens: 57147392000 | elapsed time per iteration (s): 0.44 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 2.219132E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.785 | TFLOPs: 30.32 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 109000 | lm loss value: 2.207992E+00 | lm loss PPL: 9.097431E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 109000 to checkpoints_221m 0: [2022-11-29 02:06:44,527] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step109000 is begin to save! 0: [2022-11-29 02:06:44,532] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:06:44,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:06:44,643] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:06:44,666] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:06:44,666] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:06:44,691] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:06:44,691] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:06:44,716] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:06:44,717] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:06:44,741] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:06:44,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:06:44,765] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:06:44,765] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:06:44,788] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:06:44,789] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:06:44,812] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:06:44,812] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:06:44,837] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:06:44,837] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:06:44,860] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:06:44,860] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:06:44,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:06:44,885] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:06:44,910] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:06:44,910] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:06:44,935] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:06:44,935] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:06:44,959] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:06:44,959] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:06:44,983] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:06:44,983] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:06:45,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:06:45,008] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:06:45,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:06:45,031] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:06:45,055] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:06:45,056] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:06:45,081] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:06:45,081] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:06:45,084] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:06:45,085] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step109000/mp_rank_00_model_states.pt 0: [2022-11-29 02:06:45,085] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:06:45,088] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:06:45,107] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:06:45,155] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,155] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,155] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,157] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,158] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,158] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,161] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,161] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,161] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,163] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,163] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,163] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,164] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,164] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,165] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,165] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,165] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,166] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,166] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,167] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,167] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 7: [2022-11-29 02:06:45,168] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 7: [2022-11-29 02:06:45,168] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,169] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,169] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,170] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,170] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:06:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,173] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 2: [2022-11-29 02:06:45,173] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:06:45,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 4: [2022-11-29 02:06:45,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:06:45,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,174] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 1: [2022-11-29 02:06:45,174] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,183] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,183] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,184] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,184] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,185] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:06:45,188] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:06:45,188] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 5: [2022-11-29 02:06:45,207] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:06:45,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,216] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,216] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: [2022-11-29 02:06:45,222] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:06:45,222] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step109000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 3: [2022-11-29 02:06:45,235] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step109000 is ready now! 0: successfully saved checkpoint at iteration 109000 to checkpoints_221m 7: time (ms) | save-checkpoint: 713.51 7: iteration 109010/ 115203 | consumed samples: 27906560 | consumed tokens: 57152634880 | elapsed time per iteration (s): 0.52 | learning rate: 2.131E-05 | global batch size: 256 | lm loss: 2.222211E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 488.070 | TFLOPs: 25.61 | 7: iteration 109020/ 115203 | consumed samples: 27909120 | consumed tokens: 57157877760 | elapsed time per iteration (s): 0.44 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 2.227577E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.222 | TFLOPs: 30.81 | 7: iteration 109030/ 115203 | consumed samples: 27911680 | consumed tokens: 57163120640 | elapsed time per iteration (s): 0.45 | learning rate: 2.130E-05 | global batch size: 256 | lm loss: 2.206756E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.645 | TFLOPs: 29.99 | 7: iteration 109040/ 115203 | consumed samples: 27914240 | consumed tokens: 57168363520 | elapsed time per iteration (s): 0.43 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 2.229420E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.774 | TFLOPs: 31.10 | 7: iteration 109050/ 115203 | consumed samples: 27916800 | consumed tokens: 57173606400 | elapsed time per iteration (s): 0.43 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 2.216206E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.989 | TFLOPs: 31.06 | 7: iteration 109060/ 115203 | consumed samples: 27919360 | consumed tokens: 57178849280 | elapsed time per iteration (s): 0.43 | learning rate: 2.129E-05 | global batch size: 256 | lm loss: 2.239190E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.076 | TFLOPs: 31.01 | 7: iteration 109070/ 115203 | consumed samples: 27921920 | consumed tokens: 57184092160 | elapsed time per iteration (s): 0.44 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 2.229514E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.204 | TFLOPs: 30.76 | 7: iteration 109080/ 115203 | consumed samples: 27924480 | consumed tokens: 57189335040 | elapsed time per iteration (s): 0.44 | learning rate: 2.128E-05 | global batch size: 256 | lm loss: 2.206890E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.199 | TFLOPs: 30.70 | 7: iteration 109090/ 115203 | consumed samples: 27927040 | consumed tokens: 57194577920 | elapsed time per iteration (s): 0.43 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 2.237999E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.680 | TFLOPs: 31.15 | 7: iteration 109100/ 115203 | consumed samples: 27929600 | consumed tokens: 57199820800 | elapsed time per iteration (s): 0.43 | learning rate: 2.127E-05 | global batch size: 256 | lm loss: 2.241924E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.542 | TFLOPs: 30.93 | 7: iteration 109110/ 115203 | consumed samples: 27932160 | consumed tokens: 57205063680 | elapsed time per iteration (s): 0.43 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 2.181821E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.899 | TFLOPs: 31.00 | 7: iteration 109120/ 115203 | consumed samples: 27934720 | consumed tokens: 57210306560 | elapsed time per iteration (s): 0.43 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 2.221477E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.785 | TFLOPs: 31.26 | 7: iteration 109130/ 115203 | consumed samples: 27937280 | consumed tokens: 57215549440 | elapsed time per iteration (s): 0.43 | learning rate: 2.126E-05 | global batch size: 256 | lm loss: 2.200164E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.498 | TFLOPs: 31.35 | 7: iteration 109140/ 115203 | consumed samples: 27939840 | consumed tokens: 57220792320 | elapsed time per iteration (s): 0.43 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 2.243775E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.069 | TFLOPs: 30.91 | 7: iteration 109150/ 115203 | consumed samples: 27942400 | consumed tokens: 57226035200 | elapsed time per iteration (s): 0.44 | learning rate: 2.125E-05 | global batch size: 256 | lm loss: 2.190520E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.886 | TFLOPs: 30.69 | 7: iteration 109160/ 115203 | consumed samples: 27944960 | consumed tokens: 57231278080 | elapsed time per iteration (s): 0.43 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 2.232650E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.333 | TFLOPs: 31.45 | 7: iteration 109170/ 115203 | consumed samples: 27947520 | consumed tokens: 57236520960 | elapsed time per iteration (s): 0.44 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 2.226291E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.614 | TFLOPs: 30.67 | 7: iteration 109180/ 115203 | consumed samples: 27950080 | consumed tokens: 57241763840 | elapsed time per iteration (s): 0.44 | learning rate: 2.124E-05 | global batch size: 256 | lm loss: 2.197233E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.256 | TFLOPs: 30.81 | 7: iteration 109190/ 115203 | consumed samples: 27952640 | consumed tokens: 57247006720 | elapsed time per iteration (s): 0.43 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 2.222391E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.796 | TFLOPs: 31.16 | 7: iteration 109200/ 115203 | consumed samples: 27955200 | consumed tokens: 57252249600 | elapsed time per iteration (s): 0.43 | learning rate: 2.123E-05 | global batch size: 256 | lm loss: 2.198023E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.679 | TFLOPs: 31.04 | 7: iteration 109210/ 115203 | consumed samples: 27957760 | consumed tokens: 57257492480 | elapsed time per iteration (s): 0.44 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 2.218421E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.693 | TFLOPs: 30.78 | 7: iteration 109220/ 115203 | consumed samples: 27960320 | consumed tokens: 57262735360 | elapsed time per iteration (s): 0.44 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 2.235742E+00 | grad norm: 0.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.505 | TFLOPs: 30.67 | 7: iteration 109230/ 115203 | consumed samples: 27962880 | consumed tokens: 57267978240 | elapsed time per iteration (s): 0.43 | learning rate: 2.122E-05 | global batch size: 256 | lm loss: 2.230926E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.707 | TFLOPs: 31.05 | 7: iteration 109240/ 115203 | consumed samples: 27965440 | consumed tokens: 57273221120 | elapsed time per iteration (s): 0.43 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 2.217803E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.225 | TFLOPs: 31.28 | 7: iteration 109250/ 115203 | consumed samples: 27968000 | consumed tokens: 57278464000 | elapsed time per iteration (s): 0.44 | learning rate: 2.121E-05 | global batch size: 256 | lm loss: 2.239152E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.876 | TFLOPs: 30.32 | 7: iteration 109260/ 115203 | consumed samples: 27970560 | consumed tokens: 57283706880 | elapsed time per iteration (s): 0.44 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 2.216904E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.605 | TFLOPs: 30.67 | 7: iteration 109270/ 115203 | consumed samples: 27973120 | consumed tokens: 57288949760 | elapsed time per iteration (s): 0.43 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 2.238370E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.839 | TFLOPs: 31.26 | 7: iteration 109280/ 115203 | consumed samples: 27975680 | consumed tokens: 57294192640 | elapsed time per iteration (s): 0.43 | learning rate: 2.120E-05 | global batch size: 256 | lm loss: 2.226579E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.616 | TFLOPs: 31.30 | 7: iteration 109290/ 115203 | consumed samples: 27978240 | consumed tokens: 57299435520 | elapsed time per iteration (s): 0.44 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 2.222697E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.105 | TFLOPs: 30.70 | 7: iteration 109300/ 115203 | consumed samples: 27980800 | consumed tokens: 57304678400 | elapsed time per iteration (s): 0.43 | learning rate: 2.119E-05 | global batch size: 256 | lm loss: 2.261348E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.668 | TFLOPs: 31.25 | 7: iteration 109310/ 115203 | consumed samples: 27983360 | consumed tokens: 57309921280 | elapsed time per iteration (s): 0.43 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 2.238802E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.784 | TFLOPs: 31.36 | 7: iteration 109320/ 115203 | consumed samples: 27985920 | consumed tokens: 57315164160 | elapsed time per iteration (s): 0.44 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 2.228230E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.821 | TFLOPs: 30.63 | 7: iteration 109330/ 115203 | consumed samples: 27988480 | consumed tokens: 57320407040 | elapsed time per iteration (s): 0.43 | learning rate: 2.118E-05 | global batch size: 256 | lm loss: 2.220031E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.918 | TFLOPs: 31.32 | 7: iteration 109340/ 115203 | consumed samples: 27991040 | consumed tokens: 57325649920 | elapsed time per iteration (s): 0.46 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 2.207333E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 554.752 | TFLOPs: 29.11 | 7: iteration 109350/ 115203 | consumed samples: 27993600 | consumed tokens: 57330892800 | elapsed time per iteration (s): 0.43 | learning rate: 2.117E-05 | global batch size: 256 | lm loss: 2.201209E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.681 | TFLOPs: 31.36 | 7: iteration 109360/ 115203 | consumed samples: 27996160 | consumed tokens: 57336135680 | elapsed time per iteration (s): 0.43 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 2.211462E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.485 | TFLOPs: 30.93 | 7: iteration 109370/ 115203 | consumed samples: 27998720 | consumed tokens: 57341378560 | elapsed time per iteration (s): 0.44 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 2.236043E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.042 | TFLOPs: 30.49 | 7: iteration 109380/ 115203 | consumed samples: 28001280 | consumed tokens: 57346621440 | elapsed time per iteration (s): 0.43 | learning rate: 2.116E-05 | global batch size: 256 | lm loss: 2.215942E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.043 | TFLOPs: 31.01 | 7: iteration 109390/ 115203 | consumed samples: 28003840 | consumed tokens: 57351864320 | elapsed time per iteration (s): 0.43 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 2.237202E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.681 | TFLOPs: 31.25 | 7: iteration 109400/ 115203 | consumed samples: 28006400 | consumed tokens: 57357107200 | elapsed time per iteration (s): 0.46 | learning rate: 2.115E-05 | global batch size: 256 | lm loss: 2.198630E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.378 | TFLOPs: 29.14 | 7: iteration 109410/ 115203 | consumed samples: 28008960 | consumed tokens: 57362350080 | elapsed time per iteration (s): 0.43 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 2.199689E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.346 | TFLOPs: 30.92 | 7: iteration 109420/ 115203 | consumed samples: 28011520 | consumed tokens: 57367592960 | elapsed time per iteration (s): 0.47 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 2.230222E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.927 | TFLOPs: 28.85 | 7: iteration 109430/ 115203 | consumed samples: 28014080 | consumed tokens: 57372835840 | elapsed time per iteration (s): 0.44 | learning rate: 2.114E-05 | global batch size: 256 | lm loss: 2.200416E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.746 | TFLOPs: 30.52 | 7: iteration 109440/ 115203 | consumed samples: 28016640 | consumed tokens: 57378078720 | elapsed time per iteration (s): 0.45 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 2.190040E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.148 | TFLOPs: 30.07 | 7: iteration 109450/ 115203 | consumed samples: 28019200 | consumed tokens: 57383321600 | elapsed time per iteration (s): 0.45 | learning rate: 2.113E-05 | global batch size: 256 | lm loss: 2.217157E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.830 | TFLOPs: 29.79 | 7: iteration 109460/ 115203 | consumed samples: 28021760 | consumed tokens: 57388564480 | elapsed time per iteration (s): 0.43 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 2.222106E+00 | grad norm: 0.297 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.690 | TFLOPs: 30.94 | 7: iteration 109470/ 115203 | consumed samples: 28024320 | consumed tokens: 57393807360 | elapsed time per iteration (s): 0.43 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 2.246301E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.811 | TFLOPs: 31.26 | 7: iteration 109480/ 115203 | consumed samples: 28026880 | consumed tokens: 57399050240 | elapsed time per iteration (s): 0.44 | learning rate: 2.112E-05 | global batch size: 256 | lm loss: 2.216385E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.044 | TFLOPs: 30.54 | 7: iteration 109490/ 115203 | consumed samples: 28029440 | consumed tokens: 57404293120 | elapsed time per iteration (s): 0.43 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 2.191784E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.465 | TFLOPs: 30.98 | 7: iteration 109500/ 115203 | consumed samples: 28032000 | consumed tokens: 57409536000 | elapsed time per iteration (s): 0.44 | learning rate: 2.111E-05 | global batch size: 256 | lm loss: 2.221673E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.493 | TFLOPs: 30.82 | 7: iteration 109510/ 115203 | consumed samples: 28034560 | consumed tokens: 57414778880 | elapsed time per iteration (s): 0.44 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 2.215417E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.309 | TFLOPs: 30.71 | 7: iteration 109520/ 115203 | consumed samples: 28037120 | consumed tokens: 57420021760 | elapsed time per iteration (s): 0.43 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 2.238054E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.616 | TFLOPs: 31.30 | 7: iteration 109530/ 115203 | consumed samples: 28039680 | consumed tokens: 57425264640 | elapsed time per iteration (s): 0.43 | learning rate: 2.110E-05 | global batch size: 256 | lm loss: 2.217316E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.291 | TFLOPs: 30.92 | 7: iteration 109540/ 115203 | consumed samples: 28042240 | consumed tokens: 57430507520 | elapsed time per iteration (s): 0.45 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 2.223650E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.295 | TFLOPs: 29.61 | 7: iteration 109550/ 115203 | consumed samples: 28044800 | consumed tokens: 57435750400 | elapsed time per iteration (s): 0.45 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 2.207747E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.814 | TFLOPs: 29.63 | 7: iteration 109560/ 115203 | consumed samples: 28047360 | consumed tokens: 57440993280 | elapsed time per iteration (s): 0.44 | learning rate: 2.109E-05 | global batch size: 256 | lm loss: 2.240128E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.358 | TFLOPs: 30.40 | 7: iteration 109570/ 115203 | consumed samples: 28049920 | consumed tokens: 57446236160 | elapsed time per iteration (s): 0.45 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 2.218555E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.766 | TFLOPs: 29.79 | 7: iteration 109580/ 115203 | consumed samples: 28052480 | consumed tokens: 57451479040 | elapsed time per iteration (s): 0.44 | learning rate: 2.108E-05 | global batch size: 256 | lm loss: 2.237191E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.777 | TFLOPs: 30.73 | 7: iteration 109590/ 115203 | consumed samples: 28055040 | consumed tokens: 57456721920 | elapsed time per iteration (s): 0.43 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 2.215848E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.140 | TFLOPs: 30.91 | 7: iteration 109600/ 115203 | consumed samples: 28057600 | consumed tokens: 57461964800 | elapsed time per iteration (s): 0.43 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 2.218548E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.364 | TFLOPs: 31.13 | 7: iteration 109610/ 115203 | consumed samples: 28060160 | consumed tokens: 57467207680 | elapsed time per iteration (s): 0.43 | learning rate: 2.107E-05 | global batch size: 256 | lm loss: 2.209562E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.589 | TFLOPs: 30.88 | 7: iteration 109620/ 115203 | consumed samples: 28062720 | consumed tokens: 57472450560 | elapsed time per iteration (s): 0.43 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 2.231401E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.043 | TFLOPs: 31.38 | 7: iteration 109630/ 115203 | consumed samples: 28065280 | consumed tokens: 57477693440 | elapsed time per iteration (s): 0.44 | learning rate: 2.106E-05 | global batch size: 256 | lm loss: 2.213916E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.605 | TFLOPs: 30.67 | 7: iteration 109640/ 115203 | consumed samples: 28067840 | consumed tokens: 57482936320 | elapsed time per iteration (s): 0.46 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 2.224449E+00 | grad norm: 0.239 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.120 | TFLOPs: 29.39 | 7: iteration 109650/ 115203 | consumed samples: 28070400 | consumed tokens: 57488179200 | elapsed time per iteration (s): 0.43 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 2.231388E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.390 | TFLOPs: 31.13 | 7: iteration 109660/ 115203 | consumed samples: 28072960 | consumed tokens: 57493422080 | elapsed time per iteration (s): 0.45 | learning rate: 2.105E-05 | global batch size: 256 | lm loss: 2.239403E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.993 | TFLOPs: 29.64 | 7: iteration 109670/ 115203 | consumed samples: 28075520 | consumed tokens: 57498664960 | elapsed time per iteration (s): 0.42 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 2.199086E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.980 | TFLOPs: 31.64 | 7: iteration 109680/ 115203 | consumed samples: 28078080 | consumed tokens: 57503907840 | elapsed time per iteration (s): 0.44 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 2.208434E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.080 | TFLOPs: 30.38 | 7: iteration 109690/ 115203 | consumed samples: 28080640 | consumed tokens: 57509150720 | elapsed time per iteration (s): 0.43 | learning rate: 2.104E-05 | global batch size: 256 | lm loss: 2.246581E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.403 | TFLOPs: 31.13 | 7: iteration 109700/ 115203 | consumed samples: 28083200 | consumed tokens: 57514393600 | elapsed time per iteration (s): 0.42 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 2.222799E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.389 | TFLOPs: 31.71 | 7: iteration 109710/ 115203 | consumed samples: 28085760 | consumed tokens: 57519636480 | elapsed time per iteration (s): 0.42 | learning rate: 2.103E-05 | global batch size: 256 | lm loss: 2.234425E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.646 | TFLOPs: 31.83 | 7: iteration 109720/ 115203 | consumed samples: 28088320 | consumed tokens: 57524879360 | elapsed time per iteration (s): 0.43 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 2.237012E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.692 | TFLOPs: 30.89 | 7: iteration 109730/ 115203 | consumed samples: 28090880 | consumed tokens: 57530122240 | elapsed time per iteration (s): 0.42 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 2.200380E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.695 | TFLOPs: 31.88 | 7: iteration 109740/ 115203 | consumed samples: 28093440 | consumed tokens: 57535365120 | elapsed time per iteration (s): 0.43 | learning rate: 2.102E-05 | global batch size: 256 | lm loss: 2.222353E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.521 | TFLOPs: 30.98 | 7: iteration 109750/ 115203 | consumed samples: 28096000 | consumed tokens: 57540608000 | elapsed time per iteration (s): 0.43 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 2.232418E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.229 | TFLOPs: 31.55 | 7: iteration 109760/ 115203 | consumed samples: 28098560 | consumed tokens: 57545850880 | elapsed time per iteration (s): 0.47 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 2.199222E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.987 | TFLOPs: 28.49 | 7: iteration 109770/ 115203 | consumed samples: 28101120 | consumed tokens: 57551093760 | elapsed time per iteration (s): 0.44 | learning rate: 2.101E-05 | global batch size: 256 | lm loss: 2.186737E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.752 | TFLOPs: 30.73 | 7: iteration 109780/ 115203 | consumed samples: 28103680 | consumed tokens: 57556336640 | elapsed time per iteration (s): 0.44 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 2.186896E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.974 | TFLOPs: 30.64 | 7: iteration 109790/ 115203 | consumed samples: 28106240 | consumed tokens: 57561579520 | elapsed time per iteration (s): 0.43 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 2.237616E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.408 | TFLOPs: 31.24 | 7: iteration 109800/ 115203 | consumed samples: 28108800 | consumed tokens: 57566822400 | elapsed time per iteration (s): 0.42 | learning rate: 2.100E-05 | global batch size: 256 | lm loss: 2.243512E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.297 | TFLOPs: 31.71 | 7: iteration 109810/ 115203 | consumed samples: 28111360 | consumed tokens: 57572065280 | elapsed time per iteration (s): 0.44 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 2.211938E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.131 | TFLOPs: 30.86 | 7: iteration 109820/ 115203 | consumed samples: 28113920 | consumed tokens: 57577308160 | elapsed time per iteration (s): 0.45 | learning rate: 2.099E-05 | global batch size: 256 | lm loss: 2.232407E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.424 | TFLOPs: 29.98 | 7: iteration 109830/ 115203 | consumed samples: 28116480 | consumed tokens: 57582551040 | elapsed time per iteration (s): 0.66 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 2.207669E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 387.416 | TFLOPs: 20.33 | 7: iteration 109840/ 115203 | consumed samples: 28119040 | consumed tokens: 57587793920 | elapsed time per iteration (s): 0.43 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 2.212260E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.393 | TFLOPs: 30.92 | 7: iteration 109850/ 115203 | consumed samples: 28121600 | consumed tokens: 57593036800 | elapsed time per iteration (s): 0.43 | learning rate: 2.098E-05 | global batch size: 256 | lm loss: 2.194003E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.738 | TFLOPs: 31.00 | 7: iteration 109860/ 115203 | consumed samples: 28124160 | consumed tokens: 57598279680 | elapsed time per iteration (s): 0.44 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 2.245532E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.365 | TFLOPs: 30.50 | 7: iteration 109870/ 115203 | consumed samples: 28126720 | consumed tokens: 57603522560 | elapsed time per iteration (s): 0.43 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 2.192455E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.466 | TFLOPs: 31.35 | 7: iteration 109880/ 115203 | consumed samples: 28129280 | consumed tokens: 57608765440 | elapsed time per iteration (s): 0.43 | learning rate: 2.097E-05 | global batch size: 256 | lm loss: 2.200582E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.769 | TFLOPs: 31.15 | 7: iteration 109890/ 115203 | consumed samples: 28131840 | consumed tokens: 57614008320 | elapsed time per iteration (s): 0.42 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 2.210747E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.862 | TFLOPs: 31.63 | 7: iteration 109900/ 115203 | consumed samples: 28134400 | consumed tokens: 57619251200 | elapsed time per iteration (s): 0.43 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 2.189621E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.362 | TFLOPs: 31.45 | 7: iteration 109910/ 115203 | consumed samples: 28136960 | consumed tokens: 57624494080 | elapsed time per iteration (s): 0.44 | learning rate: 2.096E-05 | global batch size: 256 | lm loss: 2.246249E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.560 | TFLOPs: 30.30 | 7: iteration 109920/ 115203 | consumed samples: 28139520 | consumed tokens: 57629736960 | elapsed time per iteration (s): 0.46 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 2.217822E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.222 | TFLOPs: 29.13 | 7: iteration 109930/ 115203 | consumed samples: 28142080 | consumed tokens: 57634979840 | elapsed time per iteration (s): 0.43 | learning rate: 2.095E-05 | global batch size: 256 | lm loss: 2.219882E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.701 | TFLOPs: 31.20 | 7: iteration 109940/ 115203 | consumed samples: 28144640 | consumed tokens: 57640222720 | elapsed time per iteration (s): 0.44 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 2.221311E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.100 | TFLOPs: 30.86 | 7: iteration 109950/ 115203 | consumed samples: 28147200 | consumed tokens: 57645465600 | elapsed time per iteration (s): 0.43 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 2.217872E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.847 | TFLOPs: 31.16 | 7: iteration 109960/ 115203 | consumed samples: 28149760 | consumed tokens: 57650708480 | elapsed time per iteration (s): 0.45 | learning rate: 2.094E-05 | global batch size: 256 | lm loss: 2.216956E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.628 | TFLOPs: 29.57 | 7: iteration 109970/ 115203 | consumed samples: 28152320 | consumed tokens: 57655951360 | elapsed time per iteration (s): 0.43 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 2.216307E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.675 | TFLOPs: 31.36 | 7: iteration 109980/ 115203 | consumed samples: 28154880 | consumed tokens: 57661194240 | elapsed time per iteration (s): 0.43 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 2.217789E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.592 | TFLOPs: 31.20 | 7: iteration 109990/ 115203 | consumed samples: 28157440 | consumed tokens: 57666437120 | elapsed time per iteration (s): 0.44 | learning rate: 2.093E-05 | global batch size: 256 | lm loss: 2.192637E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.712 | TFLOPs: 30.68 | 0: [2022-11-29 02:14:04,145] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=0, lr=[2.092302863901853e-05, 2.092302863901853e-05, 2.092302863901853e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 110000/ 115203 | consumed samples: 28160000 | consumed tokens: 57671680000 | elapsed time per iteration (s): 0.44 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 2.214527E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.266 | TFLOPs: 30.24 | 0: steps: 110000 loss: 2.1683 iter time (s): 0.435 samples/sec: 588.959 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 110000 | lm loss value: 2.280979E+00 | lm loss PPL: 9.786256E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 110000 to checkpoints_221m 0: [2022-11-29 02:14:04,352] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step110000 is begin to save! 0: [2022-11-29 02:14:04,383] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:14:04,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:14:04,497] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:14:04,519] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:14:04,520] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:14:04,545] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:14:04,545] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:14:04,569] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:14:04,570] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:14:04,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:14:04,595] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:14:04,619] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:14:04,619] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:14:04,643] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:14:04,644] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:14:04,668] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:14:04,668] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:14:04,693] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:14:04,693] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:14:04,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:14:04,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:14:04,742] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:14:04,742] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:14:04,767] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:14:04,767] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:14:04,792] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:14:04,792] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:14:04,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:14:04,817] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:14:04,842] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:14:04,842] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:14:04,868] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:14:04,868] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:14:04,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:14:04,892] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:14:04,917] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:14:04,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:14:04,941] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:14:04,941] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:14:04,945] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:14:04,946] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step110000/mp_rank_00_model_states.pt 0: [2022-11-29 02:14:04,946] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:14:04,948] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:14:04,967] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:14:05,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,020] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,020] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,020] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,021] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,021] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,023] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,023] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,016] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,016] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,016] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,018] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,018] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,018] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,019] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,019] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,019] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,022] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,022] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,022] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,024] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,024] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,024] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,025] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,025] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,025] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,026] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,026] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,026] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,029] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,029] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,030] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,030] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:14:05,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 4: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,032] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,032] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,032] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,033] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:14:05,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,034] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 7: [2022-11-29 02:14:05,034] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 3: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:14:05,031] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:14:05,031] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,037] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:14:05,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,038] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,038] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,042] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,042] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,042] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:14:05,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,043] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 6: [2022-11-29 02:14:05,043] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,052] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,052] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 1: [2022-11-29 02:14:05,052] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:14:05,053] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:14:05,053] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 2: [2022-11-29 02:14:05,067] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: [2022-11-29 02:14:05,098] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:14:05,098] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,115] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,115] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,115] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,116] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,116] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:14:05,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,120] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step110000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:14:05,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 5: [2022-11-29 02:14:05,120] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step110000 is ready now! 0: successfully saved checkpoint at iteration 110000 to checkpoints_221m 7: time (ms) | save-checkpoint: 813.86 7: iteration 110010/ 115203 | consumed samples: 28162560 | consumed tokens: 57676922880 | elapsed time per iteration (s): 0.54 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 2.228278E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 476.731 | TFLOPs: 25.01 | 7: iteration 110020/ 115203 | consumed samples: 28165120 | consumed tokens: 57682165760 | elapsed time per iteration (s): 0.43 | learning rate: 2.092E-05 | global batch size: 256 | lm loss: 2.236099E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.735 | TFLOPs: 30.99 | 7: iteration 110030/ 115203 | consumed samples: 28167680 | consumed tokens: 57687408640 | elapsed time per iteration (s): 0.44 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 2.218991E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.310 | TFLOPs: 30.71 | 7: iteration 110040/ 115203 | consumed samples: 28170240 | consumed tokens: 57692651520 | elapsed time per iteration (s): 0.43 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 2.239596E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.944 | TFLOPs: 31.22 | 7: iteration 110050/ 115203 | consumed samples: 28172800 | consumed tokens: 57697894400 | elapsed time per iteration (s): 0.43 | learning rate: 2.091E-05 | global batch size: 256 | lm loss: 2.229490E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.555 | TFLOPs: 31.14 | 7: iteration 110060/ 115203 | consumed samples: 28175360 | consumed tokens: 57703137280 | elapsed time per iteration (s): 0.43 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 2.208173E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.741 | TFLOPs: 31.31 | 7: iteration 110070/ 115203 | consumed samples: 28177920 | consumed tokens: 57708380160 | elapsed time per iteration (s): 0.43 | learning rate: 2.090E-05 | global batch size: 256 | lm loss: 2.206052E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.476 | TFLOPs: 31.30 | 7: iteration 110080/ 115203 | consumed samples: 28180480 | consumed tokens: 57713623040 | elapsed time per iteration (s): 0.43 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 2.207624E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.068 | TFLOPs: 31.33 | 7: iteration 110090/ 115203 | consumed samples: 28183040 | consumed tokens: 57718865920 | elapsed time per iteration (s): 0.43 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 2.219445E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.082 | TFLOPs: 31.12 | 7: iteration 110100/ 115203 | consumed samples: 28185600 | consumed tokens: 57724108800 | elapsed time per iteration (s): 0.45 | learning rate: 2.089E-05 | global batch size: 256 | lm loss: 2.200755E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.126 | TFLOPs: 29.60 | 7: iteration 110110/ 115203 | consumed samples: 28188160 | consumed tokens: 57729351680 | elapsed time per iteration (s): 0.43 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 2.196690E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.699 | TFLOPs: 31.15 | 7: iteration 110120/ 115203 | consumed samples: 28190720 | consumed tokens: 57734594560 | elapsed time per iteration (s): 0.44 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 2.192345E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.015 | TFLOPs: 30.75 | 7: iteration 110130/ 115203 | consumed samples: 28193280 | consumed tokens: 57739837440 | elapsed time per iteration (s): 0.44 | learning rate: 2.088E-05 | global batch size: 256 | lm loss: 2.209072E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.945 | TFLOPs: 30.64 | 7: iteration 110140/ 115203 | consumed samples: 28195840 | consumed tokens: 57745080320 | elapsed time per iteration (s): 0.43 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 2.237202E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.677 | TFLOPs: 30.99 | 7: iteration 110150/ 115203 | consumed samples: 28198400 | consumed tokens: 57750323200 | elapsed time per iteration (s): 0.44 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 2.168775E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.249 | TFLOPs: 30.23 | 7: iteration 110160/ 115203 | consumed samples: 28200960 | consumed tokens: 57755566080 | elapsed time per iteration (s): 0.44 | learning rate: 2.087E-05 | global batch size: 256 | lm loss: 2.213054E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.560 | TFLOPs: 30.78 | 7: iteration 110170/ 115203 | consumed samples: 28203520 | consumed tokens: 57760808960 | elapsed time per iteration (s): 0.44 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 2.210197E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.217 | TFLOPs: 30.34 | 7: iteration 110180/ 115203 | consumed samples: 28206080 | consumed tokens: 57766051840 | elapsed time per iteration (s): 0.44 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 2.231462E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.165 | TFLOPs: 30.70 | 7: iteration 110190/ 115203 | consumed samples: 28208640 | consumed tokens: 57771294720 | elapsed time per iteration (s): 0.44 | learning rate: 2.086E-05 | global batch size: 256 | lm loss: 2.220215E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.510 | TFLOPs: 30.46 | 7: iteration 110200/ 115203 | consumed samples: 28211200 | consumed tokens: 57776537600 | elapsed time per iteration (s): 0.45 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 2.217883E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.237 | TFLOPs: 29.60 | 7: iteration 110210/ 115203 | consumed samples: 28213760 | consumed tokens: 57781780480 | elapsed time per iteration (s): 0.47 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 2.203719E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 549.083 | TFLOPs: 28.81 | 7: iteration 110220/ 115203 | consumed samples: 28216320 | consumed tokens: 57787023360 | elapsed time per iteration (s): 0.45 | learning rate: 2.085E-05 | global batch size: 256 | lm loss: 2.226511E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 564.939 | TFLOPs: 29.64 | 7: iteration 110230/ 115203 | consumed samples: 28218880 | consumed tokens: 57792266240 | elapsed time per iteration (s): 0.43 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 2.224874E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.831 | TFLOPs: 31.00 | 7: iteration 110240/ 115203 | consumed samples: 28221440 | consumed tokens: 57797509120 | elapsed time per iteration (s): 0.43 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 2.213900E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.915 | TFLOPs: 30.90 | 7: iteration 110250/ 115203 | consumed samples: 28224000 | consumed tokens: 57802752000 | elapsed time per iteration (s): 0.43 | learning rate: 2.084E-05 | global batch size: 256 | lm loss: 2.217601E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.974 | TFLOPs: 31.27 | 7: iteration 110260/ 115203 | consumed samples: 28226560 | consumed tokens: 57807994880 | elapsed time per iteration (s): 0.43 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 2.204394E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.762 | TFLOPs: 31.10 | 7: iteration 110270/ 115203 | consumed samples: 28229120 | consumed tokens: 57813237760 | elapsed time per iteration (s): 0.44 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 2.212484E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.181 | TFLOPs: 30.86 | 7: iteration 110280/ 115203 | consumed samples: 28231680 | consumed tokens: 57818480640 | elapsed time per iteration (s): 0.44 | learning rate: 2.083E-05 | global batch size: 256 | lm loss: 2.213657E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.942 | TFLOPs: 30.48 | 7: iteration 110290/ 115203 | consumed samples: 28234240 | consumed tokens: 57823723520 | elapsed time per iteration (s): 0.43 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 2.210022E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.580 | TFLOPs: 31.25 | 7: iteration 110300/ 115203 | consumed samples: 28236800 | consumed tokens: 57828966400 | elapsed time per iteration (s): 0.43 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 2.224669E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.553 | TFLOPs: 31.25 | 7: iteration 110310/ 115203 | consumed samples: 28239360 | consumed tokens: 57834209280 | elapsed time per iteration (s): 0.43 | learning rate: 2.082E-05 | global batch size: 256 | lm loss: 2.214147E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.257 | TFLOPs: 31.07 | 7: iteration 110320/ 115203 | consumed samples: 28241920 | consumed tokens: 57839452160 | elapsed time per iteration (s): 0.43 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 2.178844E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.756 | TFLOPs: 30.94 | 7: iteration 110330/ 115203 | consumed samples: 28244480 | consumed tokens: 57844695040 | elapsed time per iteration (s): 0.44 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 2.214240E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.932 | TFLOPs: 30.64 | 7: iteration 110340/ 115203 | consumed samples: 28247040 | consumed tokens: 57849937920 | elapsed time per iteration (s): 0.43 | learning rate: 2.081E-05 | global batch size: 256 | lm loss: 2.224569E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.720 | TFLOPs: 30.89 | 7: iteration 110350/ 115203 | consumed samples: 28249600 | consumed tokens: 57855180800 | elapsed time per iteration (s): 0.44 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 2.213797E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.057 | TFLOPs: 30.70 | 7: iteration 110360/ 115203 | consumed samples: 28252160 | consumed tokens: 57860423680 | elapsed time per iteration (s): 0.45 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 2.239745E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.192 | TFLOPs: 29.86 | 7: iteration 110370/ 115203 | consumed samples: 28254720 | consumed tokens: 57865666560 | elapsed time per iteration (s): 0.43 | learning rate: 2.080E-05 | global batch size: 256 | lm loss: 2.226975E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.418 | TFLOPs: 31.40 | 7: iteration 110380/ 115203 | consumed samples: 28257280 | consumed tokens: 57870909440 | elapsed time per iteration (s): 0.43 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 2.237011E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.225 | TFLOPs: 31.34 | 7: iteration 110390/ 115203 | consumed samples: 28259840 | consumed tokens: 57876152320 | elapsed time per iteration (s): 0.44 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 2.220537E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.922 | TFLOPs: 30.74 | 7: iteration 110400/ 115203 | consumed samples: 28262400 | consumed tokens: 57881395200 | elapsed time per iteration (s): 0.43 | learning rate: 2.079E-05 | global batch size: 256 | lm loss: 2.209862E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.900 | TFLOPs: 30.90 | 7: iteration 110410/ 115203 | consumed samples: 28264960 | consumed tokens: 57886638080 | elapsed time per iteration (s): 0.43 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 2.239527E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.899 | TFLOPs: 31.11 | 7: iteration 110420/ 115203 | consumed samples: 28267520 | consumed tokens: 57891880960 | elapsed time per iteration (s): 0.43 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 2.218374E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.610 | TFLOPs: 31.20 | 7: iteration 110430/ 115203 | consumed samples: 28270080 | consumed tokens: 57897123840 | elapsed time per iteration (s): 0.43 | learning rate: 2.078E-05 | global batch size: 256 | lm loss: 2.200476E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.496 | TFLOPs: 31.03 | 7: iteration 110440/ 115203 | consumed samples: 28272640 | consumed tokens: 57902366720 | elapsed time per iteration (s): 0.45 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 2.199507E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.151 | TFLOPs: 29.91 | 7: iteration 110450/ 115203 | consumed samples: 28275200 | consumed tokens: 57907609600 | elapsed time per iteration (s): 0.43 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 2.200912E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.273 | TFLOPs: 31.02 | 7: iteration 110460/ 115203 | consumed samples: 28277760 | consumed tokens: 57912852480 | elapsed time per iteration (s): 0.43 | learning rate: 2.077E-05 | global batch size: 256 | lm loss: 2.231480E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.160 | TFLOPs: 30.91 | 7: iteration 110470/ 115203 | consumed samples: 28280320 | consumed tokens: 57918095360 | elapsed time per iteration (s): 0.43 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 2.200154E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.500 | TFLOPs: 31.19 | 7: iteration 110480/ 115203 | consumed samples: 28282880 | consumed tokens: 57923338240 | elapsed time per iteration (s): 0.43 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 2.217513E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.547 | TFLOPs: 31.40 | 7: iteration 110490/ 115203 | consumed samples: 28285440 | consumed tokens: 57928581120 | elapsed time per iteration (s): 0.44 | learning rate: 2.076E-05 | global batch size: 256 | lm loss: 2.194860E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.397 | TFLOPs: 30.50 | 7: iteration 110500/ 115203 | consumed samples: 28288000 | consumed tokens: 57933824000 | elapsed time per iteration (s): 0.45 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 2.214030E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.094 | TFLOPs: 29.81 | 7: iteration 110510/ 115203 | consumed samples: 28290560 | consumed tokens: 57939066880 | elapsed time per iteration (s): 0.44 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 2.195888E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.831 | TFLOPs: 30.74 | 7: iteration 110520/ 115203 | consumed samples: 28293120 | consumed tokens: 57944309760 | elapsed time per iteration (s): 0.43 | learning rate: 2.075E-05 | global batch size: 256 | lm loss: 2.181795E+00 | grad norm: 0.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.195 | TFLOPs: 31.23 | 7: iteration 110530/ 115203 | consumed samples: 28295680 | consumed tokens: 57949552640 | elapsed time per iteration (s): 0.42 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 2.205186E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.086 | TFLOPs: 31.75 | 7: iteration 110540/ 115203 | consumed samples: 28298240 | consumed tokens: 57954795520 | elapsed time per iteration (s): 0.44 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 2.211295E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.798 | TFLOPs: 30.53 | 7: iteration 110550/ 115203 | consumed samples: 28300800 | consumed tokens: 57960038400 | elapsed time per iteration (s): 0.44 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 2.197775E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.383 | TFLOPs: 30.61 | 7: iteration 110560/ 115203 | consumed samples: 28303360 | consumed tokens: 57965281280 | elapsed time per iteration (s): 0.44 | learning rate: 2.074E-05 | global batch size: 256 | lm loss: 2.232028E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.869 | TFLOPs: 30.79 | 7: iteration 110570/ 115203 | consumed samples: 28305920 | consumed tokens: 57970524160 | elapsed time per iteration (s): 0.44 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 2.222150E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.927 | TFLOPs: 30.64 | 7: iteration 110580/ 115203 | consumed samples: 28308480 | consumed tokens: 57975767040 | elapsed time per iteration (s): 0.43 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 2.210106E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.768 | TFLOPs: 30.89 | 7: iteration 110590/ 115203 | consumed samples: 28311040 | consumed tokens: 57981009920 | elapsed time per iteration (s): 0.43 | learning rate: 2.073E-05 | global batch size: 256 | lm loss: 2.202930E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.235 | TFLOPs: 31.02 | 7: iteration 110600/ 115203 | consumed samples: 28313600 | consumed tokens: 57986252800 | elapsed time per iteration (s): 0.43 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 2.214102E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.988 | TFLOPs: 31.11 | 7: iteration 110610/ 115203 | consumed samples: 28316160 | consumed tokens: 57991495680 | elapsed time per iteration (s): 0.43 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 2.214199E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.465 | TFLOPs: 31.19 | 7: iteration 110620/ 115203 | consumed samples: 28318720 | consumed tokens: 57996738560 | elapsed time per iteration (s): 0.43 | learning rate: 2.072E-05 | global batch size: 256 | lm loss: 2.226184E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.367 | TFLOPs: 31.03 | 7: iteration 110630/ 115203 | consumed samples: 28321280 | consumed tokens: 58001981440 | elapsed time per iteration (s): 0.44 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 2.201113E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.992 | TFLOPs: 30.80 | 7: iteration 110640/ 115203 | consumed samples: 28323840 | consumed tokens: 58007224320 | elapsed time per iteration (s): 0.43 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 2.221060E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.410 | TFLOPs: 31.35 | 7: iteration 110650/ 115203 | consumed samples: 28326400 | consumed tokens: 58012467200 | elapsed time per iteration (s): 0.43 | learning rate: 2.071E-05 | global batch size: 256 | lm loss: 2.235513E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.990 | TFLOPs: 31.06 | 7: iteration 110660/ 115203 | consumed samples: 28328960 | consumed tokens: 58017710080 | elapsed time per iteration (s): 0.45 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 2.220352E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.465 | TFLOPs: 29.98 | 7: iteration 110670/ 115203 | consumed samples: 28331520 | consumed tokens: 58022952960 | elapsed time per iteration (s): 0.43 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 2.218434E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.951 | TFLOPs: 31.27 | 7: iteration 110680/ 115203 | consumed samples: 28334080 | consumed tokens: 58028195840 | elapsed time per iteration (s): 0.43 | learning rate: 2.070E-05 | global batch size: 256 | lm loss: 2.208820E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.899 | TFLOPs: 31.27 | 7: iteration 110690/ 115203 | consumed samples: 28336640 | consumed tokens: 58033438720 | elapsed time per iteration (s): 0.43 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.241544E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.370 | TFLOPs: 31.08 | 7: iteration 110700/ 115203 | consumed samples: 28339200 | consumed tokens: 58038681600 | elapsed time per iteration (s): 0.42 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.208343E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.032 | TFLOPs: 31.64 | 7: iteration 110710/ 115203 | consumed samples: 28341760 | consumed tokens: 58043924480 | elapsed time per iteration (s): 0.44 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.219788E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.444 | TFLOPs: 30.87 | 7: iteration 110720/ 115203 | consumed samples: 28344320 | consumed tokens: 58049167360 | elapsed time per iteration (s): 0.44 | learning rate: 2.069E-05 | global batch size: 256 | lm loss: 2.236467E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.334 | TFLOPs: 30.40 | 7: iteration 110730/ 115203 | consumed samples: 28346880 | consumed tokens: 58054410240 | elapsed time per iteration (s): 0.44 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 2.217315E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.653 | TFLOPs: 30.47 | 7: iteration 110740/ 115203 | consumed samples: 28349440 | consumed tokens: 58059653120 | elapsed time per iteration (s): 0.44 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 2.228700E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.821 | TFLOPs: 30.53 | 7: iteration 110750/ 115203 | consumed samples: 28352000 | consumed tokens: 58064896000 | elapsed time per iteration (s): 0.43 | learning rate: 2.068E-05 | global batch size: 256 | lm loss: 2.195649E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.056 | TFLOPs: 31.01 | 7: iteration 110760/ 115203 | consumed samples: 28354560 | consumed tokens: 58070138880 | elapsed time per iteration (s): 0.44 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 2.217260E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.278 | TFLOPs: 30.66 | 7: iteration 110770/ 115203 | consumed samples: 28357120 | consumed tokens: 58075381760 | elapsed time per iteration (s): 0.43 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 2.201970E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.300 | TFLOPs: 31.02 | 7: iteration 110780/ 115203 | consumed samples: 28359680 | consumed tokens: 58080624640 | elapsed time per iteration (s): 0.44 | learning rate: 2.067E-05 | global batch size: 256 | lm loss: 2.206712E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.230 | TFLOPs: 30.65 | 7: iteration 110790/ 115203 | consumed samples: 28362240 | consumed tokens: 58085867520 | elapsed time per iteration (s): 0.44 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.229170E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.096 | TFLOPs: 30.44 | 7: iteration 110800/ 115203 | consumed samples: 28364800 | consumed tokens: 58091110400 | elapsed time per iteration (s): 0.43 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.191513E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.958 | TFLOPs: 31.11 | 7: iteration 110810/ 115203 | consumed samples: 28367360 | consumed tokens: 58096353280 | elapsed time per iteration (s): 0.43 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.216348E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.181 | TFLOPs: 31.39 | 7: iteration 110820/ 115203 | consumed samples: 28369920 | consumed tokens: 58101596160 | elapsed time per iteration (s): 0.43 | learning rate: 2.066E-05 | global batch size: 256 | lm loss: 2.191159E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.870 | TFLOPs: 31.11 | 7: iteration 110830/ 115203 | consumed samples: 28372480 | consumed tokens: 58106839040 | elapsed time per iteration (s): 0.43 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 2.223197E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.137 | TFLOPs: 31.12 | 7: iteration 110840/ 115203 | consumed samples: 28375040 | consumed tokens: 58112081920 | elapsed time per iteration (s): 0.43 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 2.226732E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.044 | TFLOPs: 31.22 | 7: iteration 110850/ 115203 | consumed samples: 28377600 | consumed tokens: 58117324800 | elapsed time per iteration (s): 0.44 | learning rate: 2.065E-05 | global batch size: 256 | lm loss: 2.230685E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.775 | TFLOPs: 30.47 | 7: iteration 110860/ 115203 | consumed samples: 28380160 | consumed tokens: 58122567680 | elapsed time per iteration (s): 0.43 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 2.226149E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.140 | TFLOPs: 30.96 | 7: iteration 110870/ 115203 | consumed samples: 28382720 | consumed tokens: 58127810560 | elapsed time per iteration (s): 0.43 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 2.197704E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.766 | TFLOPs: 31.05 | 7: iteration 110880/ 115203 | consumed samples: 28385280 | consumed tokens: 58133053440 | elapsed time per iteration (s): 0.43 | learning rate: 2.064E-05 | global batch size: 256 | lm loss: 2.222908E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.370 | TFLOPs: 31.13 | 7: iteration 110890/ 115203 | consumed samples: 28387840 | consumed tokens: 58138296320 | elapsed time per iteration (s): 0.43 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.214145E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.665 | TFLOPs: 31.31 | 7: iteration 110900/ 115203 | consumed samples: 28390400 | consumed tokens: 58143539200 | elapsed time per iteration (s): 0.43 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.201213E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.419 | TFLOPs: 31.14 | 7: iteration 110910/ 115203 | consumed samples: 28392960 | consumed tokens: 58148782080 | elapsed time per iteration (s): 0.43 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.218557E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.122 | TFLOPs: 31.17 | 7: iteration 110920/ 115203 | consumed samples: 28395520 | consumed tokens: 58154024960 | elapsed time per iteration (s): 0.44 | learning rate: 2.063E-05 | global batch size: 256 | lm loss: 2.223539E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.659 | TFLOPs: 30.57 | 7: iteration 110930/ 115203 | consumed samples: 28398080 | consumed tokens: 58159267840 | elapsed time per iteration (s): 0.44 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 2.184376E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.360 | TFLOPs: 30.66 | 7: iteration 110940/ 115203 | consumed samples: 28400640 | consumed tokens: 58164510720 | elapsed time per iteration (s): 0.45 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 2.207596E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.498 | TFLOPs: 30.04 | 7: iteration 110950/ 115203 | consumed samples: 28403200 | consumed tokens: 58169753600 | elapsed time per iteration (s): 0.46 | learning rate: 2.062E-05 | global batch size: 256 | lm loss: 2.182471E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 560.218 | TFLOPs: 29.39 | 7: iteration 110960/ 115203 | consumed samples: 28405760 | consumed tokens: 58174996480 | elapsed time per iteration (s): 0.43 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 2.216357E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.420 | TFLOPs: 30.98 | 7: iteration 110970/ 115203 | consumed samples: 28408320 | consumed tokens: 58180239360 | elapsed time per iteration (s): 0.44 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 2.274304E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.741 | TFLOPs: 30.84 | 7: iteration 110980/ 115203 | consumed samples: 28410880 | consumed tokens: 58185482240 | elapsed time per iteration (s): 0.44 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 2.227418E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.963 | TFLOPs: 30.69 | 7: iteration 110990/ 115203 | consumed samples: 28413440 | consumed tokens: 58190725120 | elapsed time per iteration (s): 0.44 | learning rate: 2.061E-05 | global batch size: 256 | lm loss: 2.226295E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.554 | TFLOPs: 30.51 | 7: iteration 111000/ 115203 | consumed samples: 28416000 | consumed tokens: 58195968000 | elapsed time per iteration (s): 0.54 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 2.217712E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 472.724 | TFLOPs: 24.80 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 111000 | lm loss value: 2.132934E+00 | lm loss PPL: 8.439589E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 111000 to checkpoints_221m 0: [2022-11-29 02:21:22,592] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step111000 is begin to save! 0: [2022-11-29 02:21:22,598] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:21:22,847] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:21:22,848] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:21:22,878] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:21:22,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:21:22,911] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:21:22,911] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:21:22,944] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:21:22,944] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:21:22,976] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:21:22,977] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:21:23,010] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:21:23,010] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:21:23,043] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:21:23,043] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:21:23,076] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:21:23,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:21:23,108] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:21:23,109] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:21:23,132] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:21:23,132] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:21:23,156] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:21:23,156] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:21:23,180] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:21:23,181] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:21:23,204] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:21:23,204] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:21:23,227] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:21:23,227] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:21:23,251] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:21:23,251] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:21:23,275] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:21:23,275] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:21:23,298] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:21:23,299] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:21:23,323] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:21:23,323] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:21:23,346] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:21:23,346] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:21:23,350] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:21:23,351] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step111000/mp_rank_00_model_states.pt 0: [2022-11-29 02:21:23,351] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:21:23,353] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:21:23,373] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:21:23,420] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,420] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,420] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,423] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,423] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,426] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,424] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,424] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,424] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,426] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,426] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,427] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,427] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 2: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:21:23,430] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:21:23,430] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:21:23,432] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,434] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:21:23,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,434] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:21:23,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 4: [2022-11-29 02:21:23,434] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,438] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:21:23,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,438] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:21:23,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 7: [2022-11-29 02:21:23,438] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,428] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,431] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,431] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:21:23,435] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,429] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 3: [2022-11-29 02:21:23,435] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,429] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 3: [2022-11-29 02:21:23,435] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,432] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,433] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,433] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,436] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 5: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:21:23,437] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:21:23,437] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,443] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,443] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,444] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:21:23,446] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,446] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: [2022-11-29 02:21:23,466] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step111000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:21:23,466] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step111000 is ready now! 0: successfully saved checkpoint at iteration 111000 to checkpoints_221m 7: time (ms) | save-checkpoint: 917.93 7: iteration 111010/ 115203 | consumed samples: 28418560 | consumed tokens: 58201210880 | elapsed time per iteration (s): 0.57 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 2.245628E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 451.395 | TFLOPs: 23.68 | 7: iteration 111020/ 115203 | consumed samples: 28421120 | consumed tokens: 58206453760 | elapsed time per iteration (s): 0.43 | learning rate: 2.060E-05 | global batch size: 256 | lm loss: 2.235314E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.469 | TFLOPs: 31.35 | 7: iteration 111030/ 115203 | consumed samples: 28423680 | consumed tokens: 58211696640 | elapsed time per iteration (s): 0.43 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 2.229262E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.642 | TFLOPs: 31.15 | 7: iteration 111040/ 115203 | consumed samples: 28426240 | consumed tokens: 58216939520 | elapsed time per iteration (s): 0.43 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 2.214381E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.744 | TFLOPs: 31.21 | 7: iteration 111050/ 115203 | consumed samples: 28428800 | consumed tokens: 58222182400 | elapsed time per iteration (s): 0.45 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 2.186134E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 569.472 | TFLOPs: 29.88 | 7: iteration 111060/ 115203 | consumed samples: 28431360 | consumed tokens: 58227425280 | elapsed time per iteration (s): 0.43 | learning rate: 2.059E-05 | global batch size: 256 | lm loss: 2.187250E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.419 | TFLOPs: 31.50 | 7: iteration 111070/ 115203 | consumed samples: 28433920 | consumed tokens: 58232668160 | elapsed time per iteration (s): 0.44 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 2.206621E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.995 | TFLOPs: 30.33 | 7: iteration 111080/ 115203 | consumed samples: 28436480 | consumed tokens: 58237911040 | elapsed time per iteration (s): 0.44 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 2.206862E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.996 | TFLOPs: 30.38 | 7: iteration 111090/ 115203 | consumed samples: 28439040 | consumed tokens: 58243153920 | elapsed time per iteration (s): 0.43 | learning rate: 2.058E-05 | global batch size: 256 | lm loss: 2.222848E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.268 | TFLOPs: 31.55 | 7: iteration 111100/ 115203 | consumed samples: 28441600 | consumed tokens: 58248396800 | elapsed time per iteration (s): 0.43 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.245312E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.521 | TFLOPs: 30.88 | 7: iteration 111110/ 115203 | consumed samples: 28444160 | consumed tokens: 58253639680 | elapsed time per iteration (s): 0.44 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.232483E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.169 | TFLOPs: 30.70 | 7: iteration 111120/ 115203 | consumed samples: 28446720 | consumed tokens: 58258882560 | elapsed time per iteration (s): 0.44 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.169700E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.776 | TFLOPs: 30.52 | 7: iteration 111130/ 115203 | consumed samples: 28449280 | consumed tokens: 58264125440 | elapsed time per iteration (s): 0.44 | learning rate: 2.057E-05 | global batch size: 256 | lm loss: 2.225859E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.457 | TFLOPs: 30.35 | 7: iteration 111140/ 115203 | consumed samples: 28451840 | consumed tokens: 58269368320 | elapsed time per iteration (s): 0.43 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 2.226395E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.335 | TFLOPs: 31.45 | 7: iteration 111150/ 115203 | consumed samples: 28454400 | consumed tokens: 58274611200 | elapsed time per iteration (s): 0.43 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 2.210899E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.299 | TFLOPs: 31.39 | 7: iteration 111160/ 115203 | consumed samples: 28456960 | consumed tokens: 58279854080 | elapsed time per iteration (s): 0.44 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 2.192320E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.696 | TFLOPs: 30.63 | 7: iteration 111170/ 115203 | consumed samples: 28459520 | consumed tokens: 58285096960 | elapsed time per iteration (s): 0.42 | learning rate: 2.056E-05 | global batch size: 256 | lm loss: 2.245675E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.548 | TFLOPs: 32.09 | 7: iteration 111180/ 115203 | consumed samples: 28462080 | consumed tokens: 58290339840 | elapsed time per iteration (s): 0.44 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 2.205190E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.847 | TFLOPs: 30.58 | 7: iteration 111190/ 115203 | consumed samples: 28464640 | consumed tokens: 58295582720 | elapsed time per iteration (s): 0.43 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 2.192839E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.570 | TFLOPs: 31.04 | 7: iteration 111200/ 115203 | consumed samples: 28467200 | consumed tokens: 58300825600 | elapsed time per iteration (s): 0.44 | learning rate: 2.055E-05 | global batch size: 256 | lm loss: 2.233318E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.151 | TFLOPs: 30.81 | 7: iteration 111210/ 115203 | consumed samples: 28469760 | consumed tokens: 58306068480 | elapsed time per iteration (s): 0.44 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 2.179037E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.975 | TFLOPs: 30.22 | 7: iteration 111220/ 115203 | consumed samples: 28472320 | consumed tokens: 58311311360 | elapsed time per iteration (s): 0.44 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 2.233845E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.499 | TFLOPs: 30.88 | 7: iteration 111230/ 115203 | consumed samples: 28474880 | consumed tokens: 58316554240 | elapsed time per iteration (s): 0.43 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 2.208348E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.661 | TFLOPs: 31.25 | 7: iteration 111240/ 115203 | consumed samples: 28477440 | consumed tokens: 58321797120 | elapsed time per iteration (s): 0.43 | learning rate: 2.054E-05 | global batch size: 256 | lm loss: 2.248446E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.838 | TFLOPs: 31.47 | 7: iteration 111250/ 115203 | consumed samples: 28480000 | consumed tokens: 58327040000 | elapsed time per iteration (s): 0.43 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.204895E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.444 | TFLOPs: 30.93 | 7: iteration 111260/ 115203 | consumed samples: 28482560 | consumed tokens: 58332282880 | elapsed time per iteration (s): 0.43 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.208852E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.745 | TFLOPs: 31.26 | 7: iteration 111270/ 115203 | consumed samples: 28485120 | consumed tokens: 58337525760 | elapsed time per iteration (s): 0.43 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.203388E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.039 | TFLOPs: 30.96 | 7: iteration 111280/ 115203 | consumed samples: 28487680 | consumed tokens: 58342768640 | elapsed time per iteration (s): 0.43 | learning rate: 2.053E-05 | global batch size: 256 | lm loss: 2.204567E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.085 | TFLOPs: 31.38 | 7: iteration 111290/ 115203 | consumed samples: 28490240 | consumed tokens: 58348011520 | elapsed time per iteration (s): 0.44 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 2.188039E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.081 | TFLOPs: 30.28 | 7: iteration 111300/ 115203 | consumed samples: 28492800 | consumed tokens: 58353254400 | elapsed time per iteration (s): 0.43 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 2.223781E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.674 | TFLOPs: 31.04 | 7: iteration 111310/ 115203 | consumed samples: 28495360 | consumed tokens: 58358497280 | elapsed time per iteration (s): 0.43 | learning rate: 2.052E-05 | global batch size: 256 | lm loss: 2.196484E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.677 | TFLOPs: 31.10 | 7: iteration 111320/ 115203 | consumed samples: 28497920 | consumed tokens: 58363740160 | elapsed time per iteration (s): 0.43 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 2.202843E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.739 | TFLOPs: 31.52 | 7: iteration 111330/ 115203 | consumed samples: 28500480 | consumed tokens: 58368983040 | elapsed time per iteration (s): 0.43 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 2.211665E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.929 | TFLOPs: 31.11 | 7: iteration 111340/ 115203 | consumed samples: 28503040 | consumed tokens: 58374225920 | elapsed time per iteration (s): 0.44 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 2.213802E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.715 | TFLOPs: 30.78 | 7: iteration 111350/ 115203 | consumed samples: 28505600 | consumed tokens: 58379468800 | elapsed time per iteration (s): 0.44 | learning rate: 2.051E-05 | global batch size: 256 | lm loss: 2.210993E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.072 | TFLOPs: 30.70 | 7: iteration 111360/ 115203 | consumed samples: 28508160 | consumed tokens: 58384711680 | elapsed time per iteration (s): 0.43 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 2.229361E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.340 | TFLOPs: 31.08 | 7: iteration 111370/ 115203 | consumed samples: 28510720 | consumed tokens: 58389954560 | elapsed time per iteration (s): 0.43 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 2.212924E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.014 | TFLOPs: 31.43 | 7: iteration 111380/ 115203 | consumed samples: 28513280 | consumed tokens: 58395197440 | elapsed time per iteration (s): 0.44 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 2.237638E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.437 | TFLOPs: 30.72 | 7: iteration 111390/ 115203 | consumed samples: 28515840 | consumed tokens: 58400440320 | elapsed time per iteration (s): 0.43 | learning rate: 2.050E-05 | global batch size: 256 | lm loss: 2.226673E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.358 | TFLOPs: 31.45 | 7: iteration 111400/ 115203 | consumed samples: 28518400 | consumed tokens: 58405683200 | elapsed time per iteration (s): 0.43 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.191109E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.216 | TFLOPs: 31.49 | 7: iteration 111410/ 115203 | consumed samples: 28520960 | consumed tokens: 58410926080 | elapsed time per iteration (s): 0.43 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.210998E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.674 | TFLOPs: 31.10 | 7: iteration 111420/ 115203 | consumed samples: 28523520 | consumed tokens: 58416168960 | elapsed time per iteration (s): 0.44 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.201537E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.462 | TFLOPs: 30.77 | 7: iteration 111430/ 115203 | consumed samples: 28526080 | consumed tokens: 58421411840 | elapsed time per iteration (s): 0.43 | learning rate: 2.049E-05 | global batch size: 256 | lm loss: 2.250828E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.570 | TFLOPs: 31.30 | 7: iteration 111440/ 115203 | consumed samples: 28528640 | consumed tokens: 58426654720 | elapsed time per iteration (s): 0.44 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 2.197074E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.066 | TFLOPs: 30.85 | 7: iteration 111450/ 115203 | consumed samples: 28531200 | consumed tokens: 58431897600 | elapsed time per iteration (s): 0.43 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 2.201858E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.224 | TFLOPs: 31.44 | 7: iteration 111460/ 115203 | consumed samples: 28533760 | consumed tokens: 58437140480 | elapsed time per iteration (s): 0.43 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 2.202661E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.698 | TFLOPs: 30.94 | 7: iteration 111470/ 115203 | consumed samples: 28536320 | consumed tokens: 58442383360 | elapsed time per iteration (s): 0.43 | learning rate: 2.048E-05 | global batch size: 256 | lm loss: 2.230423E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.392 | TFLOPs: 31.08 | 7: iteration 111480/ 115203 | consumed samples: 28538880 | consumed tokens: 58447626240 | elapsed time per iteration (s): 0.45 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.246485E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.849 | TFLOPs: 29.95 | 7: iteration 111490/ 115203 | consumed samples: 28541440 | consumed tokens: 58452869120 | elapsed time per iteration (s): 0.43 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.209392E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.913 | TFLOPs: 30.90 | 7: iteration 111500/ 115203 | consumed samples: 28544000 | consumed tokens: 58458112000 | elapsed time per iteration (s): 0.43 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.254469E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.998 | TFLOPs: 31.01 | 7: iteration 111510/ 115203 | consumed samples: 28546560 | consumed tokens: 58463354880 | elapsed time per iteration (s): 0.43 | learning rate: 2.047E-05 | global batch size: 256 | lm loss: 2.240639E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.670 | TFLOPs: 31.36 | 7: iteration 111520/ 115203 | consumed samples: 28549120 | consumed tokens: 58468597760 | elapsed time per iteration (s): 0.44 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.204365E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.987 | TFLOPs: 30.59 | 7: iteration 111530/ 115203 | consumed samples: 28551680 | consumed tokens: 58473840640 | elapsed time per iteration (s): 0.43 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.201917E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.188 | TFLOPs: 30.91 | 7: iteration 111540/ 115203 | consumed samples: 28554240 | consumed tokens: 58479083520 | elapsed time per iteration (s): 0.44 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.206785E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.417 | TFLOPs: 30.51 | 7: iteration 111550/ 115203 | consumed samples: 28556800 | consumed tokens: 58484326400 | elapsed time per iteration (s): 0.43 | learning rate: 2.046E-05 | global batch size: 256 | lm loss: 2.215418E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.435 | TFLOPs: 31.40 | 7: iteration 111560/ 115203 | consumed samples: 28559360 | consumed tokens: 58489569280 | elapsed time per iteration (s): 0.44 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 2.210784E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.335 | TFLOPs: 30.87 | 7: iteration 111570/ 115203 | consumed samples: 28561920 | consumed tokens: 58494812160 | elapsed time per iteration (s): 0.43 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 2.241011E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.480 | TFLOPs: 31.30 | 7: iteration 111580/ 115203 | consumed samples: 28564480 | consumed tokens: 58500055040 | elapsed time per iteration (s): 0.43 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 2.197586E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.161 | TFLOPs: 31.54 | 7: iteration 111590/ 115203 | consumed samples: 28567040 | consumed tokens: 58505297920 | elapsed time per iteration (s): 0.43 | learning rate: 2.045E-05 | global batch size: 256 | lm loss: 2.209966E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.317 | TFLOPs: 31.13 | 7: iteration 111600/ 115203 | consumed samples: 28569600 | consumed tokens: 58510540800 | elapsed time per iteration (s): 0.43 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.223981E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.814 | TFLOPs: 31.31 | 7: iteration 111610/ 115203 | consumed samples: 28572160 | consumed tokens: 58515783680 | elapsed time per iteration (s): 0.43 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.223339E+00 | grad norm: 0.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.341 | TFLOPs: 31.45 | 7: iteration 111620/ 115203 | consumed samples: 28574720 | consumed tokens: 58521026560 | elapsed time per iteration (s): 0.45 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.213971E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.022 | TFLOPs: 30.07 | 7: iteration 111630/ 115203 | consumed samples: 28577280 | consumed tokens: 58526269440 | elapsed time per iteration (s): 0.43 | learning rate: 2.044E-05 | global batch size: 256 | lm loss: 2.194530E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.980 | TFLOPs: 31.22 | 7: iteration 111640/ 115203 | consumed samples: 28579840 | consumed tokens: 58531512320 | elapsed time per iteration (s): 0.44 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.228586E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.107 | TFLOPs: 30.86 | 7: iteration 111650/ 115203 | consumed samples: 28582400 | consumed tokens: 58536755200 | elapsed time per iteration (s): 0.43 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.210886E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.131 | TFLOPs: 31.07 | 7: iteration 111660/ 115203 | consumed samples: 28584960 | consumed tokens: 58541998080 | elapsed time per iteration (s): 0.42 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.167741E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.895 | TFLOPs: 31.84 | 7: iteration 111670/ 115203 | consumed samples: 28587520 | consumed tokens: 58547240960 | elapsed time per iteration (s): 0.43 | learning rate: 2.043E-05 | global batch size: 256 | lm loss: 2.211571E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.699 | TFLOPs: 30.89 | 7: iteration 111680/ 115203 | consumed samples: 28590080 | consumed tokens: 58552483840 | elapsed time per iteration (s): 0.42 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.236457E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.329 | TFLOPs: 31.97 | 7: iteration 111690/ 115203 | consumed samples: 28592640 | consumed tokens: 58557726720 | elapsed time per iteration (s): 0.43 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.208073E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.602 | TFLOPs: 31.41 | 7: iteration 111700/ 115203 | consumed samples: 28595200 | consumed tokens: 58562969600 | elapsed time per iteration (s): 0.43 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.201353E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.102 | TFLOPs: 30.91 | 7: iteration 111710/ 115203 | consumed samples: 28597760 | consumed tokens: 58568212480 | elapsed time per iteration (s): 0.44 | learning rate: 2.042E-05 | global batch size: 256 | lm loss: 2.204235E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.422 | TFLOPs: 30.82 | 7: iteration 111720/ 115203 | consumed samples: 28600320 | consumed tokens: 58573455360 | elapsed time per iteration (s): 0.43 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.214145E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.132 | TFLOPs: 31.33 | 7: iteration 111730/ 115203 | consumed samples: 28602880 | consumed tokens: 58578698240 | elapsed time per iteration (s): 0.43 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.226326E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.938 | TFLOPs: 30.90 | 7: iteration 111740/ 115203 | consumed samples: 28605440 | consumed tokens: 58583941120 | elapsed time per iteration (s): 0.43 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.220658E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.728 | TFLOPs: 30.89 | 7: iteration 111750/ 115203 | consumed samples: 28608000 | consumed tokens: 58589184000 | elapsed time per iteration (s): 0.43 | learning rate: 2.041E-05 | global batch size: 256 | lm loss: 2.234899E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.594 | TFLOPs: 31.20 | 7: iteration 111760/ 115203 | consumed samples: 28610560 | consumed tokens: 58594426880 | elapsed time per iteration (s): 0.43 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.212676E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.066 | TFLOPs: 31.22 | 7: iteration 111770/ 115203 | consumed samples: 28613120 | consumed tokens: 58599669760 | elapsed time per iteration (s): 0.44 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.257979E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.734 | TFLOPs: 30.84 | 7: iteration 111780/ 115203 | consumed samples: 28615680 | consumed tokens: 58604912640 | elapsed time per iteration (s): 0.43 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.198993E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.500 | TFLOPs: 30.98 | 7: iteration 111790/ 115203 | consumed samples: 28618240 | consumed tokens: 58610155520 | elapsed time per iteration (s): 0.44 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.201280E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.785 | TFLOPs: 30.47 | 7: iteration 111800/ 115203 | consumed samples: 28620800 | consumed tokens: 58615398400 | elapsed time per iteration (s): 0.44 | learning rate: 2.040E-05 | global batch size: 256 | lm loss: 2.177856E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.903 | TFLOPs: 30.74 | 7: iteration 111810/ 115203 | consumed samples: 28623360 | consumed tokens: 58620641280 | elapsed time per iteration (s): 0.43 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 2.228403E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.272 | TFLOPs: 31.02 | 7: iteration 111820/ 115203 | consumed samples: 28625920 | consumed tokens: 58625884160 | elapsed time per iteration (s): 0.44 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 2.238376E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.285 | TFLOPs: 30.29 | 7: iteration 111830/ 115203 | consumed samples: 28628480 | consumed tokens: 58631127040 | elapsed time per iteration (s): 0.43 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 2.202817E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.892 | TFLOPs: 31.53 | 7: iteration 111840/ 115203 | consumed samples: 28631040 | consumed tokens: 58636369920 | elapsed time per iteration (s): 0.44 | learning rate: 2.039E-05 | global batch size: 256 | lm loss: 2.203490E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.715 | TFLOPs: 30.84 | 7: iteration 111850/ 115203 | consumed samples: 28633600 | consumed tokens: 58641612800 | elapsed time per iteration (s): 0.43 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.171270E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.663 | TFLOPs: 31.10 | 7: iteration 111860/ 115203 | consumed samples: 28636160 | consumed tokens: 58646855680 | elapsed time per iteration (s): 0.44 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.215427E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.975 | TFLOPs: 30.33 | 7: iteration 111870/ 115203 | consumed samples: 28638720 | consumed tokens: 58652098560 | elapsed time per iteration (s): 0.61 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.215720E+00 | grad norm: 0.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 418.635 | TFLOPs: 21.97 | 7: iteration 111880/ 115203 | consumed samples: 28641280 | consumed tokens: 58657341440 | elapsed time per iteration (s): 0.42 | learning rate: 2.038E-05 | global batch size: 256 | lm loss: 2.183451E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.114 | TFLOPs: 31.70 | 7: iteration 111890/ 115203 | consumed samples: 28643840 | consumed tokens: 58662584320 | elapsed time per iteration (s): 0.42 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.226156E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.316 | TFLOPs: 31.66 | 7: iteration 111900/ 115203 | consumed samples: 28646400 | consumed tokens: 58667827200 | elapsed time per iteration (s): 0.44 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.195600E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.114 | TFLOPs: 30.70 | 7: iteration 111910/ 115203 | consumed samples: 28648960 | consumed tokens: 58673070080 | elapsed time per iteration (s): 0.43 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.223672E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.940 | TFLOPs: 31.06 | 7: iteration 111920/ 115203 | consumed samples: 28651520 | consumed tokens: 58678312960 | elapsed time per iteration (s): 0.46 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.195492E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 561.409 | TFLOPs: 29.46 | 7: iteration 111930/ 115203 | consumed samples: 28654080 | consumed tokens: 58683555840 | elapsed time per iteration (s): 0.42 | learning rate: 2.037E-05 | global batch size: 256 | lm loss: 2.227565E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.559 | TFLOPs: 31.67 | 7: iteration 111940/ 115203 | consumed samples: 28656640 | consumed tokens: 58688798720 | elapsed time per iteration (s): 0.43 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.241042E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.538 | TFLOPs: 31.14 | 7: iteration 111950/ 115203 | consumed samples: 28659200 | consumed tokens: 58694041600 | elapsed time per iteration (s): 0.43 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.242867E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.707 | TFLOPs: 31.15 | 7: iteration 111960/ 115203 | consumed samples: 28661760 | consumed tokens: 58699284480 | elapsed time per iteration (s): 0.46 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.215926E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 553.334 | TFLOPs: 29.03 | 7: iteration 111970/ 115203 | consumed samples: 28664320 | consumed tokens: 58704527360 | elapsed time per iteration (s): 0.42 | learning rate: 2.036E-05 | global batch size: 256 | lm loss: 2.230602E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.098 | TFLOPs: 31.64 | 7: iteration 111980/ 115203 | consumed samples: 28666880 | consumed tokens: 58709770240 | elapsed time per iteration (s): 0.43 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.179359E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.438 | TFLOPs: 31.14 | 7: iteration 111990/ 115203 | consumed samples: 28669440 | consumed tokens: 58715013120 | elapsed time per iteration (s): 0.43 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.229719E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.348 | TFLOPs: 31.24 | 0: [2022-11-29 02:28:38,661] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=0, lr=[2.0350245708025642e-05, 2.0350245708025642e-05, 2.0350245708025642e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 112000/ 115203 | consumed samples: 28672000 | consumed tokens: 58720256000 | elapsed time per iteration (s): 0.43 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.222466E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.264 | TFLOPs: 31.23 | 0: steps: 112000 loss: 2.1991 iter time (s): 0.434 samples/sec: 589.445 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 112000 | lm loss value: 2.144499E+00 | lm loss PPL: 8.537765E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 112000 to checkpoints_221m 0: [2022-11-29 02:28:38,827] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step112000 is begin to save! 0: [2022-11-29 02:28:38,838] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:28:38,951] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:28:38,951] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:28:38,973] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:28:38,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:28:38,999] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:28:38,999] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:28:39,023] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:28:39,024] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:28:39,047] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:28:39,048] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:28:39,071] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:28:39,072] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:28:39,096] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:28:39,096] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:28:39,119] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:28:39,120] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:28:39,144] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:28:39,144] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:28:39,168] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:28:39,168] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:28:39,192] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:28:39,192] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:28:39,216] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:28:39,216] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:28:39,240] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:28:39,241] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:28:39,266] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:28:39,266] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:28:39,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:28:39,291] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:28:39,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:28:39,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:28:39,340] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:28:39,340] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:28:39,364] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:28:39,364] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:28:39,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:28:39,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:28:39,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:28:39,393] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step112000/mp_rank_00_model_states.pt 0: [2022-11-29 02:28:39,393] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:28:39,396] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,415] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:28:39,462] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,465] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,465] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,465] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,467] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,467] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,468] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,467] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,469] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,470] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,472] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,472] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,472] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,474] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,474] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,474] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 1: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:28:39,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,475] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,475] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:28:39,476] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:28:39,476] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,479] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,480] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,480] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,482] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,482] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,482] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 7: [2022-11-29 02:28:39,483] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:28:39,483] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:28:39,483] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,484] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,484] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 2: [2022-11-29 02:28:39,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:28:39,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:28:39,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,464] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,464] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,471] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,471] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,473] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,473] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,473] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:28:39,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 4: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,504] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,504] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,504] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 5: [2022-11-29 02:28:39,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:28:39,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:28:39,501] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:28:39,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,532] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,532] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: [2022-11-29 02:28:39,535] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:28:39,535] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,576] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 3: [2022-11-29 02:28:39,585] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:28:39,585] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step112000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:28:39,585] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step112000 is ready now! 0: successfully saved checkpoint at iteration 112000 to checkpoints_221m 7: time (ms) | save-checkpoint: 763.73 7: iteration 112010/ 115203 | consumed samples: 28674560 | consumed tokens: 58725498880 | elapsed time per iteration (s): 0.53 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.221720E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 486.504 | TFLOPs: 25.53 | 7: iteration 112020/ 115203 | consumed samples: 28677120 | consumed tokens: 58730741760 | elapsed time per iteration (s): 0.42 | learning rate: 2.035E-05 | global batch size: 256 | lm loss: 2.224244E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.167 | TFLOPs: 31.65 | 7: iteration 112030/ 115203 | consumed samples: 28679680 | consumed tokens: 58735984640 | elapsed time per iteration (s): 0.43 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.240281E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.947 | TFLOPs: 31.27 | 7: iteration 112040/ 115203 | consumed samples: 28682240 | consumed tokens: 58741227520 | elapsed time per iteration (s): 0.42 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.211004E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.509 | TFLOPs: 31.82 | 7: iteration 112050/ 115203 | consumed samples: 28684800 | consumed tokens: 58746470400 | elapsed time per iteration (s): 0.43 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.214930E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.619 | TFLOPs: 31.57 | 7: iteration 112060/ 115203 | consumed samples: 28687360 | consumed tokens: 58751713280 | elapsed time per iteration (s): 0.42 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.245142E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.138 | TFLOPs: 31.70 | 7: iteration 112070/ 115203 | consumed samples: 28689920 | consumed tokens: 58756956160 | elapsed time per iteration (s): 0.43 | learning rate: 2.034E-05 | global batch size: 256 | lm loss: 2.257757E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.006 | TFLOPs: 31.11 | 7: iteration 112080/ 115203 | consumed samples: 28692480 | consumed tokens: 58762199040 | elapsed time per iteration (s): 0.45 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.194714E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.638 | TFLOPs: 29.57 | 7: iteration 112090/ 115203 | consumed samples: 28695040 | consumed tokens: 58767441920 | elapsed time per iteration (s): 0.42 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.217332E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.580 | TFLOPs: 32.04 | 7: iteration 112100/ 115203 | consumed samples: 28697600 | consumed tokens: 58772684800 | elapsed time per iteration (s): 0.43 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.217000E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.202 | TFLOPs: 31.18 | 7: iteration 112110/ 115203 | consumed samples: 28700160 | consumed tokens: 58777927680 | elapsed time per iteration (s): 0.44 | learning rate: 2.033E-05 | global batch size: 256 | lm loss: 2.194691E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.222 | TFLOPs: 30.55 | 7: iteration 112120/ 115203 | consumed samples: 28702720 | consumed tokens: 58783170560 | elapsed time per iteration (s): 0.43 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.189822E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.607 | TFLOPs: 31.09 | 7: iteration 112130/ 115203 | consumed samples: 28705280 | consumed tokens: 58788413440 | elapsed time per iteration (s): 0.44 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.228695E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 575.526 | TFLOPs: 30.20 | 7: iteration 112140/ 115203 | consumed samples: 28707840 | consumed tokens: 58793656320 | elapsed time per iteration (s): 0.43 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.239667E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.281 | TFLOPs: 31.29 | 7: iteration 112150/ 115203 | consumed samples: 28710400 | consumed tokens: 58798899200 | elapsed time per iteration (s): 0.43 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.221677E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.360 | TFLOPs: 31.45 | 7: iteration 112160/ 115203 | consumed samples: 28712960 | consumed tokens: 58804142080 | elapsed time per iteration (s): 0.43 | learning rate: 2.032E-05 | global batch size: 256 | lm loss: 2.236818E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.210 | TFLOPs: 31.07 | 7: iteration 112170/ 115203 | consumed samples: 28715520 | consumed tokens: 58809384960 | elapsed time per iteration (s): 0.43 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.221483E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.207 | TFLOPs: 30.91 | 7: iteration 112180/ 115203 | consumed samples: 28718080 | consumed tokens: 58814627840 | elapsed time per iteration (s): 0.43 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.224624E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.972 | TFLOPs: 31.53 | 7: iteration 112190/ 115203 | consumed samples: 28720640 | consumed tokens: 58819870720 | elapsed time per iteration (s): 0.44 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.199647E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.125 | TFLOPs: 30.86 | 7: iteration 112200/ 115203 | consumed samples: 28723200 | consumed tokens: 58825113600 | elapsed time per iteration (s): 0.42 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.244275E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.879 | TFLOPs: 31.74 | 7: iteration 112210/ 115203 | consumed samples: 28725760 | consumed tokens: 58830356480 | elapsed time per iteration (s): 0.42 | learning rate: 2.031E-05 | global batch size: 256 | lm loss: 2.195073E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.062 | TFLOPs: 31.64 | 7: iteration 112220/ 115203 | consumed samples: 28728320 | consumed tokens: 58835599360 | elapsed time per iteration (s): 0.43 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.222433E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.005 | TFLOPs: 31.53 | 7: iteration 112230/ 115203 | consumed samples: 28730880 | consumed tokens: 58840842240 | elapsed time per iteration (s): 0.43 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.226789E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.636 | TFLOPs: 30.94 | 7: iteration 112240/ 115203 | consumed samples: 28733440 | consumed tokens: 58846085120 | elapsed time per iteration (s): 0.43 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.184931E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.020 | TFLOPs: 31.17 | 7: iteration 112250/ 115203 | consumed samples: 28736000 | consumed tokens: 58851328000 | elapsed time per iteration (s): 0.43 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.200687E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.793 | TFLOPs: 31.10 | 7: iteration 112260/ 115203 | consumed samples: 28738560 | consumed tokens: 58856570880 | elapsed time per iteration (s): 0.43 | learning rate: 2.030E-05 | global batch size: 256 | lm loss: 2.251339E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.598 | TFLOPs: 31.20 | 7: iteration 112270/ 115203 | consumed samples: 28741120 | consumed tokens: 58861813760 | elapsed time per iteration (s): 0.43 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.205071E+00 | grad norm: 0.246 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.968 | TFLOPs: 31.43 | 7: iteration 112280/ 115203 | consumed samples: 28743680 | consumed tokens: 58867056640 | elapsed time per iteration (s): 0.43 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.229901E+00 | grad norm: 0.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.400 | TFLOPs: 30.92 | 7: iteration 112290/ 115203 | consumed samples: 28746240 | consumed tokens: 58872299520 | elapsed time per iteration (s): 0.46 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.228741E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 559.295 | TFLOPs: 29.35 | 7: iteration 112300/ 115203 | consumed samples: 28748800 | consumed tokens: 58877542400 | elapsed time per iteration (s): 0.46 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.210324E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 555.399 | TFLOPs: 29.14 | 7: iteration 112310/ 115203 | consumed samples: 28751360 | consumed tokens: 58882785280 | elapsed time per iteration (s): 0.43 | learning rate: 2.029E-05 | global batch size: 256 | lm loss: 2.228208E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.835 | TFLOPs: 31.58 | 7: iteration 112320/ 115203 | consumed samples: 28753920 | consumed tokens: 58888028160 | elapsed time per iteration (s): 0.43 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.198246E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.162 | TFLOPs: 30.91 | 7: iteration 112330/ 115203 | consumed samples: 28756480 | consumed tokens: 58893271040 | elapsed time per iteration (s): 0.43 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.202386E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.525 | TFLOPs: 30.93 | 7: iteration 112340/ 115203 | consumed samples: 28759040 | consumed tokens: 58898513920 | elapsed time per iteration (s): 0.44 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.234528E+00 | grad norm: 0.291 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.616 | TFLOPs: 30.67 | 7: iteration 112350/ 115203 | consumed samples: 28761600 | consumed tokens: 58903756800 | elapsed time per iteration (s): 0.43 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.184165E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.262 | TFLOPs: 31.13 | 7: iteration 112360/ 115203 | consumed samples: 28764160 | consumed tokens: 58908999680 | elapsed time per iteration (s): 0.42 | learning rate: 2.028E-05 | global batch size: 256 | lm loss: 2.195193E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.560 | TFLOPs: 31.77 | 7: iteration 112370/ 115203 | consumed samples: 28766720 | consumed tokens: 58914242560 | elapsed time per iteration (s): 0.43 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.219512E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.399 | TFLOPs: 30.98 | 7: iteration 112380/ 115203 | consumed samples: 28769280 | consumed tokens: 58919485440 | elapsed time per iteration (s): 0.44 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.236328E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.316 | TFLOPs: 30.87 | 7: iteration 112390/ 115203 | consumed samples: 28771840 | consumed tokens: 58924728320 | elapsed time per iteration (s): 0.45 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.204834E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.613 | TFLOPs: 30.10 | 7: iteration 112400/ 115203 | consumed samples: 28774400 | consumed tokens: 58929971200 | elapsed time per iteration (s): 0.44 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.210126E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.755 | TFLOPs: 30.37 | 7: iteration 112410/ 115203 | consumed samples: 28776960 | consumed tokens: 58935214080 | elapsed time per iteration (s): 0.43 | learning rate: 2.027E-05 | global batch size: 256 | lm loss: 2.203341E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.433 | TFLOPs: 31.29 | 7: iteration 112420/ 115203 | consumed samples: 28779520 | consumed tokens: 58940456960 | elapsed time per iteration (s): 0.43 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.238952E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.279 | TFLOPs: 30.92 | 7: iteration 112430/ 115203 | consumed samples: 28782080 | consumed tokens: 58945699840 | elapsed time per iteration (s): 0.43 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.194327E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.462 | TFLOPs: 31.14 | 7: iteration 112440/ 115203 | consumed samples: 28784640 | consumed tokens: 58950942720 | elapsed time per iteration (s): 0.43 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.195022E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.670 | TFLOPs: 30.89 | 7: iteration 112450/ 115203 | consumed samples: 28787200 | consumed tokens: 58956185600 | elapsed time per iteration (s): 0.44 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.178119E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.865 | TFLOPs: 30.42 | 7: iteration 112460/ 115203 | consumed samples: 28789760 | consumed tokens: 58961428480 | elapsed time per iteration (s): 0.44 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.215312E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.850 | TFLOPs: 30.69 | 7: iteration 112470/ 115203 | consumed samples: 28792320 | consumed tokens: 58966671360 | elapsed time per iteration (s): 0.42 | learning rate: 2.026E-05 | global batch size: 256 | lm loss: 2.232804E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.444 | TFLOPs: 31.61 | 7: iteration 112480/ 115203 | consumed samples: 28794880 | consumed tokens: 58971914240 | elapsed time per iteration (s): 0.44 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.221624E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.979 | TFLOPs: 30.85 | 7: iteration 112490/ 115203 | consumed samples: 28797440 | consumed tokens: 58977157120 | elapsed time per iteration (s): 0.43 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.177222E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.774 | TFLOPs: 31.36 | 7: iteration 112500/ 115203 | consumed samples: 28800000 | consumed tokens: 58982400000 | elapsed time per iteration (s): 0.43 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.217280E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.473 | TFLOPs: 31.30 | 7: iteration 112510/ 115203 | consumed samples: 28802560 | consumed tokens: 58987642880 | elapsed time per iteration (s): 0.45 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.217515E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 573.617 | TFLOPs: 30.10 | 7: iteration 112520/ 115203 | consumed samples: 28805120 | consumed tokens: 58992885760 | elapsed time per iteration (s): 0.43 | learning rate: 2.025E-05 | global batch size: 256 | lm loss: 2.197092E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.321 | TFLOPs: 31.08 | 7: iteration 112530/ 115203 | consumed samples: 28807680 | consumed tokens: 58998128640 | elapsed time per iteration (s): 0.45 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.227951E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.595 | TFLOPs: 29.78 | 7: iteration 112540/ 115203 | consumed samples: 28810240 | consumed tokens: 59003371520 | elapsed time per iteration (s): 0.43 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.208221E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.055 | TFLOPs: 31.01 | 7: iteration 112550/ 115203 | consumed samples: 28812800 | consumed tokens: 59008614400 | elapsed time per iteration (s): 0.43 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.220079E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.031 | TFLOPs: 31.06 | 7: iteration 112560/ 115203 | consumed samples: 28815360 | consumed tokens: 59013857280 | elapsed time per iteration (s): 0.44 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.197881E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.622 | TFLOPs: 30.52 | 7: iteration 112570/ 115203 | consumed samples: 28817920 | consumed tokens: 59019100160 | elapsed time per iteration (s): 0.43 | learning rate: 2.024E-05 | global batch size: 256 | lm loss: 2.231090E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.727 | TFLOPs: 31.20 | 7: iteration 112580/ 115203 | consumed samples: 28820480 | consumed tokens: 59024343040 | elapsed time per iteration (s): 0.43 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.197562E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.495 | TFLOPs: 30.93 | 7: iteration 112590/ 115203 | consumed samples: 28823040 | consumed tokens: 59029585920 | elapsed time per iteration (s): 0.45 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.229103E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.510 | TFLOPs: 29.99 | 7: iteration 112600/ 115203 | consumed samples: 28825600 | consumed tokens: 59034828800 | elapsed time per iteration (s): 0.45 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.209916E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 566.731 | TFLOPs: 29.74 | 7: iteration 112610/ 115203 | consumed samples: 28828160 | consumed tokens: 59040071680 | elapsed time per iteration (s): 0.43 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.203922E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.924 | TFLOPs: 31.16 | 7: iteration 112620/ 115203 | consumed samples: 28830720 | consumed tokens: 59045314560 | elapsed time per iteration (s): 0.43 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.227494E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.124 | TFLOPs: 31.12 | 7: iteration 112630/ 115203 | consumed samples: 28833280 | consumed tokens: 59050557440 | elapsed time per iteration (s): 0.45 | learning rate: 2.023E-05 | global batch size: 256 | lm loss: 2.234224E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.008 | TFLOPs: 29.80 | 7: iteration 112640/ 115203 | consumed samples: 28835840 | consumed tokens: 59055800320 | elapsed time per iteration (s): 0.44 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.229950E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.085 | TFLOPs: 30.59 | 7: iteration 112650/ 115203 | consumed samples: 28838400 | consumed tokens: 59061043200 | elapsed time per iteration (s): 0.45 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.212033E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.936 | TFLOPs: 29.96 | 7: iteration 112660/ 115203 | consumed samples: 28840960 | consumed tokens: 59066286080 | elapsed time per iteration (s): 0.44 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.199454E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.376 | TFLOPs: 30.87 | 7: iteration 112670/ 115203 | consumed samples: 28843520 | consumed tokens: 59071528960 | elapsed time per iteration (s): 0.43 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.208293E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.494 | TFLOPs: 31.14 | 7: iteration 112680/ 115203 | consumed samples: 28846080 | consumed tokens: 59076771840 | elapsed time per iteration (s): 0.43 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.212265E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.349 | TFLOPs: 31.50 | 7: iteration 112690/ 115203 | consumed samples: 28848640 | consumed tokens: 59082014720 | elapsed time per iteration (s): 0.43 | learning rate: 2.022E-05 | global batch size: 256 | lm loss: 2.216522E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.827 | TFLOPs: 30.89 | 7: iteration 112700/ 115203 | consumed samples: 28851200 | consumed tokens: 59087257600 | elapsed time per iteration (s): 0.43 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.221188E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.885 | TFLOPs: 31.53 | 7: iteration 112710/ 115203 | consumed samples: 28853760 | consumed tokens: 59092500480 | elapsed time per iteration (s): 0.44 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.202932E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 582.667 | TFLOPs: 30.57 | 7: iteration 112720/ 115203 | consumed samples: 28856320 | consumed tokens: 59097743360 | elapsed time per iteration (s): 0.45 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.214173E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 572.143 | TFLOPs: 30.02 | 7: iteration 112730/ 115203 | consumed samples: 28858880 | consumed tokens: 59102986240 | elapsed time per iteration (s): 0.42 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.211732E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.642 | TFLOPs: 31.99 | 7: iteration 112740/ 115203 | consumed samples: 28861440 | consumed tokens: 59108229120 | elapsed time per iteration (s): 0.44 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.232069E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.567 | TFLOPs: 30.36 | 7: iteration 112750/ 115203 | consumed samples: 28864000 | consumed tokens: 59113472000 | elapsed time per iteration (s): 0.42 | learning rate: 2.021E-05 | global batch size: 256 | lm loss: 2.245024E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.891 | TFLOPs: 31.63 | 7: iteration 112760/ 115203 | consumed samples: 28866560 | consumed tokens: 59118714880 | elapsed time per iteration (s): 0.43 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.178874E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.423 | TFLOPs: 31.35 | 7: iteration 112770/ 115203 | consumed samples: 28869120 | consumed tokens: 59123957760 | elapsed time per iteration (s): 0.43 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.217029E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.579 | TFLOPs: 30.88 | 7: iteration 112780/ 115203 | consumed samples: 28871680 | consumed tokens: 59129200640 | elapsed time per iteration (s): 0.43 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.221961E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.258 | TFLOPs: 31.02 | 7: iteration 112790/ 115203 | consumed samples: 28874240 | consumed tokens: 59134443520 | elapsed time per iteration (s): 0.43 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.190023E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.788 | TFLOPs: 31.31 | 7: iteration 112800/ 115203 | consumed samples: 28876800 | consumed tokens: 59139686400 | elapsed time per iteration (s): 0.44 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.244052E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.460 | TFLOPs: 30.30 | 7: iteration 112810/ 115203 | consumed samples: 28879360 | consumed tokens: 59144929280 | elapsed time per iteration (s): 0.45 | learning rate: 2.020E-05 | global batch size: 256 | lm loss: 2.214466E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.468 | TFLOPs: 29.93 | 7: iteration 112820/ 115203 | consumed samples: 28881920 | consumed tokens: 59150172160 | elapsed time per iteration (s): 0.43 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.220086E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.578 | TFLOPs: 31.14 | 7: iteration 112830/ 115203 | consumed samples: 28884480 | consumed tokens: 59155415040 | elapsed time per iteration (s): 0.44 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.231229E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.788 | TFLOPs: 30.68 | 7: iteration 112840/ 115203 | consumed samples: 28887040 | consumed tokens: 59160657920 | elapsed time per iteration (s): 0.43 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.206291E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.264 | TFLOPs: 30.92 | 7: iteration 112850/ 115203 | consumed samples: 28889600 | consumed tokens: 59165900800 | elapsed time per iteration (s): 0.43 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.219873E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.635 | TFLOPs: 31.09 | 7: iteration 112860/ 115203 | consumed samples: 28892160 | consumed tokens: 59171143680 | elapsed time per iteration (s): 0.53 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.218015E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 479.722 | TFLOPs: 25.17 | 7: iteration 112870/ 115203 | consumed samples: 28894720 | consumed tokens: 59176386560 | elapsed time per iteration (s): 0.47 | learning rate: 2.019E-05 | global batch size: 256 | lm loss: 2.196207E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 542.504 | TFLOPs: 28.46 | 7: iteration 112880/ 115203 | consumed samples: 28897280 | consumed tokens: 59181629440 | elapsed time per iteration (s): 0.44 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.233745E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.277 | TFLOPs: 30.45 | 7: iteration 112890/ 115203 | consumed samples: 28899840 | consumed tokens: 59186872320 | elapsed time per iteration (s): 0.43 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.216395E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.422 | TFLOPs: 31.40 | 7: iteration 112900/ 115203 | consumed samples: 28902400 | consumed tokens: 59192115200 | elapsed time per iteration (s): 0.43 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.207766E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.731 | TFLOPs: 31.41 | 7: iteration 112910/ 115203 | consumed samples: 28904960 | consumed tokens: 59197358080 | elapsed time per iteration (s): 0.43 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.205331E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.219 | TFLOPs: 31.18 | 7: iteration 112920/ 115203 | consumed samples: 28907520 | consumed tokens: 59202600960 | elapsed time per iteration (s): 0.43 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.239291E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.034 | TFLOPs: 30.91 | 7: iteration 112930/ 115203 | consumed samples: 28910080 | consumed tokens: 59207843840 | elapsed time per iteration (s): 0.44 | learning rate: 2.018E-05 | global batch size: 256 | lm loss: 2.200001E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.594 | TFLOPs: 30.83 | 7: iteration 112940/ 115203 | consumed samples: 28912640 | consumed tokens: 59213086720 | elapsed time per iteration (s): 0.44 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.239438E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.557 | TFLOPs: 30.83 | 7: iteration 112950/ 115203 | consumed samples: 28915200 | consumed tokens: 59218329600 | elapsed time per iteration (s): 0.44 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.207317E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 577.971 | TFLOPs: 30.33 | 7: iteration 112960/ 115203 | consumed samples: 28917760 | consumed tokens: 59223572480 | elapsed time per iteration (s): 0.43 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.219407E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.128 | TFLOPs: 31.12 | 7: iteration 112970/ 115203 | consumed samples: 28920320 | consumed tokens: 59228815360 | elapsed time per iteration (s): 0.43 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.221894E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.409 | TFLOPs: 31.24 | 7: iteration 112980/ 115203 | consumed samples: 28922880 | consumed tokens: 59234058240 | elapsed time per iteration (s): 0.45 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.221411E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.561 | TFLOPs: 29.94 | 7: iteration 112990/ 115203 | consumed samples: 28925440 | consumed tokens: 59239301120 | elapsed time per iteration (s): 0.46 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.209602E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 556.656 | TFLOPs: 29.21 | 7: iteration 113000/ 115203 | consumed samples: 28928000 | consumed tokens: 59244544000 | elapsed time per iteration (s): 0.43 | learning rate: 2.017E-05 | global batch size: 256 | lm loss: 2.210572E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.741 | TFLOPs: 31.41 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 113000 | lm loss value: 2.068166E+00 | lm loss PPL: 7.910299E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 113000 to checkpoints_221m 0: [2022-11-29 02:35:55,834] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step113000 is begin to save! 0: [2022-11-29 02:35:55,862] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:35:55,985] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:35:55,986] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:35:56,007] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:35:56,007] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:35:56,033] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:35:56,033] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:35:56,056] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:35:56,057] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:35:56,080] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:35:56,080] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:35:56,104] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:35:56,104] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:35:56,128] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:35:56,128] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:35:56,151] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:35:56,151] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:35:56,174] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:35:56,174] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:35:56,197] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:35:56,197] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:35:56,220] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:35:56,220] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:35:56,243] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:35:56,244] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:35:56,268] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:35:56,268] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:35:56,291] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:35:56,292] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:35:56,315] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:35:56,316] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:35:56,339] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:35:56,339] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:35:56,363] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:35:56,363] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:35:56,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:35:56,388] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:35:56,412] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:35:56,412] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:35:56,416] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:35:56,417] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step113000/mp_rank_00_model_states.pt 0: [2022-11-29 02:35:56,417] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:35:56,419] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:35:56,437] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:35:56,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,485] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,485] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,485] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,486] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,486] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,487] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,487] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,489] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,489] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,489] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,493] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,493] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 7: [2022-11-29 02:35:56,497] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:35:56,497] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,498] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,498] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,499] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,499] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,500] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,500] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,500] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,491] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,490] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,491] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,490] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,492] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,492] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,493] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,494] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,494] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,495] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,494] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,495] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 3: [2022-11-29 02:35:56,496] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:35:56,496] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,501] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,501] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 4: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 2: [2022-11-29 02:35:56,503] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:35:56,503] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:35:56,503] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,502] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 5: [2022-11-29 02:35:56,502] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 1: [2022-11-29 02:35:56,508] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:35:56,508] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:35:56,508] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,557] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:35:56,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,557] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,557] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: [2022-11-29 02:35:56,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:35:56,565] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:35:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,600] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step113000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 6: [2022-11-29 02:35:56,600] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step113000 is ready now! 0: successfully saved checkpoint at iteration 113000 to checkpoints_221m 7: time (ms) | save-checkpoint: 772.78 7: iteration 113010/ 115203 | consumed samples: 28930560 | consumed tokens: 59249786880 | elapsed time per iteration (s): 0.52 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.234215E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 495.401 | TFLOPs: 25.99 | 7: iteration 113020/ 115203 | consumed samples: 28933120 | consumed tokens: 59255029760 | elapsed time per iteration (s): 0.44 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.234512E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 580.739 | TFLOPs: 30.47 | 7: iteration 113030/ 115203 | consumed samples: 28935680 | consumed tokens: 59260272640 | elapsed time per iteration (s): 0.42 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.200827E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.870 | TFLOPs: 31.79 | 7: iteration 113040/ 115203 | consumed samples: 28938240 | consumed tokens: 59265515520 | elapsed time per iteration (s): 0.43 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.222812E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.029 | TFLOPs: 31.17 | 7: iteration 113050/ 115203 | consumed samples: 28940800 | consumed tokens: 59270758400 | elapsed time per iteration (s): 0.43 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.230648E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.812 | TFLOPs: 31.31 | 7: iteration 113060/ 115203 | consumed samples: 28943360 | consumed tokens: 59276001280 | elapsed time per iteration (s): 0.43 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.250106E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.516 | TFLOPs: 31.40 | 7: iteration 113070/ 115203 | consumed samples: 28945920 | consumed tokens: 59281244160 | elapsed time per iteration (s): 0.43 | learning rate: 2.016E-05 | global batch size: 256 | lm loss: 2.179185E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 588.795 | TFLOPs: 30.89 | 7: iteration 113080/ 115203 | consumed samples: 28948480 | consumed tokens: 59286487040 | elapsed time per iteration (s): 0.42 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.218414E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.799 | TFLOPs: 31.68 | 7: iteration 113090/ 115203 | consumed samples: 28951040 | consumed tokens: 59291729920 | elapsed time per iteration (s): 0.44 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.223282E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.479 | TFLOPs: 30.82 | 7: iteration 113100/ 115203 | consumed samples: 28953600 | consumed tokens: 59296972800 | elapsed time per iteration (s): 0.43 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.216028E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.433 | TFLOPs: 31.40 | 7: iteration 113110/ 115203 | consumed samples: 28956160 | consumed tokens: 59302215680 | elapsed time per iteration (s): 0.43 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.206089E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.162 | TFLOPs: 31.12 | 7: iteration 113120/ 115203 | consumed samples: 28958720 | consumed tokens: 59307458560 | elapsed time per iteration (s): 0.43 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.247403E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.370 | TFLOPs: 31.55 | 7: iteration 113130/ 115203 | consumed samples: 28961280 | consumed tokens: 59312701440 | elapsed time per iteration (s): 0.44 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.232397E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.743 | TFLOPs: 30.84 | 7: iteration 113140/ 115203 | consumed samples: 28963840 | consumed tokens: 59317944320 | elapsed time per iteration (s): 0.44 | learning rate: 2.015E-05 | global batch size: 256 | lm loss: 2.193151E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 576.090 | TFLOPs: 30.23 | 7: iteration 113150/ 115203 | consumed samples: 28966400 | consumed tokens: 59323187200 | elapsed time per iteration (s): 0.42 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.206563E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 613.043 | TFLOPs: 32.17 | 7: iteration 113160/ 115203 | consumed samples: 28968960 | consumed tokens: 59328430080 | elapsed time per iteration (s): 0.45 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.218061E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.844 | TFLOPs: 30.16 | 7: iteration 113170/ 115203 | consumed samples: 28971520 | consumed tokens: 59333672960 | elapsed time per iteration (s): 0.42 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.184752E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.528 | TFLOPs: 31.72 | 7: iteration 113180/ 115203 | consumed samples: 28974080 | consumed tokens: 59338915840 | elapsed time per iteration (s): 0.44 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.233819E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.395 | TFLOPs: 30.77 | 7: iteration 113190/ 115203 | consumed samples: 28976640 | consumed tokens: 59344158720 | elapsed time per iteration (s): 0.42 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.226002E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.122 | TFLOPs: 31.64 | 7: iteration 113200/ 115203 | consumed samples: 28979200 | consumed tokens: 59349401600 | elapsed time per iteration (s): 0.44 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.247057E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.511 | TFLOPs: 30.77 | 7: iteration 113210/ 115203 | consumed samples: 28981760 | consumed tokens: 59354644480 | elapsed time per iteration (s): 0.43 | learning rate: 2.014E-05 | global batch size: 256 | lm loss: 2.217207E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.074 | TFLOPs: 31.01 | 7: iteration 113220/ 115203 | consumed samples: 28984320 | consumed tokens: 59359887360 | elapsed time per iteration (s): 0.42 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.163904E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.426 | TFLOPs: 31.61 | 7: iteration 113230/ 115203 | consumed samples: 28986880 | consumed tokens: 59365130240 | elapsed time per iteration (s): 0.42 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.209328E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.541 | TFLOPs: 31.72 | 7: iteration 113240/ 115203 | consumed samples: 28989440 | consumed tokens: 59370373120 | elapsed time per iteration (s): 0.44 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.215778E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.951 | TFLOPs: 30.80 | 7: iteration 113250/ 115203 | consumed samples: 28992000 | consumed tokens: 59375616000 | elapsed time per iteration (s): 0.43 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.203624E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.893 | TFLOPs: 31.48 | 7: iteration 113260/ 115203 | consumed samples: 28994560 | consumed tokens: 59380858880 | elapsed time per iteration (s): 0.45 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.214666E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.513 | TFLOPs: 29.93 | 7: iteration 113270/ 115203 | consumed samples: 28997120 | consumed tokens: 59386101760 | elapsed time per iteration (s): 0.43 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.214050E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.093 | TFLOPs: 31.17 | 7: iteration 113280/ 115203 | consumed samples: 28999680 | consumed tokens: 59391344640 | elapsed time per iteration (s): 0.43 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.237089E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.211 | TFLOPs: 31.02 | 7: iteration 113290/ 115203 | consumed samples: 29002240 | consumed tokens: 59396587520 | elapsed time per iteration (s): 0.43 | learning rate: 2.013E-05 | global batch size: 256 | lm loss: 2.215223E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.943 | TFLOPs: 31.27 | 7: iteration 113300/ 115203 | consumed samples: 29004800 | consumed tokens: 59401830400 | elapsed time per iteration (s): 0.43 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.192891E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.530 | TFLOPs: 31.46 | 7: iteration 113310/ 115203 | consumed samples: 29007360 | consumed tokens: 59407073280 | elapsed time per iteration (s): 0.42 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.207529E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.126 | TFLOPs: 31.75 | 7: iteration 113320/ 115203 | consumed samples: 29009920 | consumed tokens: 59412316160 | elapsed time per iteration (s): 0.45 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.193080E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 571.556 | TFLOPs: 29.99 | 7: iteration 113330/ 115203 | consumed samples: 29012480 | consumed tokens: 59417559040 | elapsed time per iteration (s): 0.43 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.215843E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.022 | TFLOPs: 30.96 | 7: iteration 113340/ 115203 | consumed samples: 29015040 | consumed tokens: 59422801920 | elapsed time per iteration (s): 0.43 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.208683E+00 | grad norm: 0.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.419 | TFLOPs: 31.24 | 7: iteration 113350/ 115203 | consumed samples: 29017600 | consumed tokens: 59428044800 | elapsed time per iteration (s): 0.42 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.191059E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.981 | TFLOPs: 31.64 | 7: iteration 113360/ 115203 | consumed samples: 29020160 | consumed tokens: 59433287680 | elapsed time per iteration (s): 0.43 | learning rate: 2.012E-05 | global batch size: 256 | lm loss: 2.230267E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.220 | TFLOPs: 31.23 | 7: iteration 113370/ 115203 | consumed samples: 29022720 | consumed tokens: 59438530560 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.190094E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.507 | TFLOPs: 31.51 | 7: iteration 113380/ 115203 | consumed samples: 29025280 | consumed tokens: 59443773440 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.194551E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.398 | TFLOPs: 31.19 | 7: iteration 113390/ 115203 | consumed samples: 29027840 | consumed tokens: 59449016320 | elapsed time per iteration (s): 0.42 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.199482E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.885 | TFLOPs: 31.79 | 7: iteration 113400/ 115203 | consumed samples: 29030400 | consumed tokens: 59454259200 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.223901E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.688 | TFLOPs: 31.52 | 7: iteration 113410/ 115203 | consumed samples: 29032960 | consumed tokens: 59459502080 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.227721E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.077 | TFLOPs: 31.49 | 7: iteration 113420/ 115203 | consumed samples: 29035520 | consumed tokens: 59464744960 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.200017E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.897 | TFLOPs: 31.53 | 7: iteration 113430/ 115203 | consumed samples: 29038080 | consumed tokens: 59469987840 | elapsed time per iteration (s): 0.42 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.219265E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.880 | TFLOPs: 31.63 | 7: iteration 113440/ 115203 | consumed samples: 29040640 | consumed tokens: 59475230720 | elapsed time per iteration (s): 0.43 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.237832E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.818 | TFLOPs: 31.52 | 7: iteration 113450/ 115203 | consumed samples: 29043200 | consumed tokens: 59480473600 | elapsed time per iteration (s): 0.44 | learning rate: 2.011E-05 | global batch size: 256 | lm loss: 2.193295E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 579.179 | TFLOPs: 30.39 | 7: iteration 113460/ 115203 | consumed samples: 29045760 | consumed tokens: 59485716480 | elapsed time per iteration (s): 0.44 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.222868E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 583.399 | TFLOPs: 30.61 | 7: iteration 113470/ 115203 | consumed samples: 29048320 | consumed tokens: 59490959360 | elapsed time per iteration (s): 0.43 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.198405E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.798 | TFLOPs: 31.52 | 7: iteration 113480/ 115203 | consumed samples: 29050880 | consumed tokens: 59496202240 | elapsed time per iteration (s): 0.43 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.224900E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.598 | TFLOPs: 31.56 | 7: iteration 113490/ 115203 | consumed samples: 29053440 | consumed tokens: 59501445120 | elapsed time per iteration (s): 0.43 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.188230E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.546 | TFLOPs: 31.30 | 7: iteration 113500/ 115203 | consumed samples: 29056000 | consumed tokens: 59506688000 | elapsed time per iteration (s): 0.42 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.197734E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.353 | TFLOPs: 31.71 | 7: iteration 113510/ 115203 | consumed samples: 29058560 | consumed tokens: 59511930880 | elapsed time per iteration (s): 0.44 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.216381E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.307 | TFLOPs: 30.82 | 7: iteration 113520/ 115203 | consumed samples: 29061120 | consumed tokens: 59517173760 | elapsed time per iteration (s): 0.43 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.233620E+00 | grad norm: 0.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.105 | TFLOPs: 31.54 | 7: iteration 113530/ 115203 | consumed samples: 29063680 | consumed tokens: 59522416640 | elapsed time per iteration (s): 0.43 | learning rate: 2.010E-05 | global batch size: 256 | lm loss: 2.222800E+00 | grad norm: 0.281 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.169 | TFLOPs: 31.07 | 7: iteration 113540/ 115203 | consumed samples: 29066240 | consumed tokens: 59527659520 | elapsed time per iteration (s): 0.42 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.221441E+00 | grad norm: 0.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.088 | TFLOPs: 31.75 | 7: iteration 113550/ 115203 | consumed samples: 29068800 | consumed tokens: 59532902400 | elapsed time per iteration (s): 0.43 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.224537E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.544 | TFLOPs: 31.09 | 7: iteration 113560/ 115203 | consumed samples: 29071360 | consumed tokens: 59538145280 | elapsed time per iteration (s): 0.42 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.207382E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.187 | TFLOPs: 32.02 | 7: iteration 113570/ 115203 | consumed samples: 29073920 | consumed tokens: 59543388160 | elapsed time per iteration (s): 0.42 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.229238E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.603 | TFLOPs: 31.83 | 7: iteration 113580/ 115203 | consumed samples: 29076480 | consumed tokens: 59548631040 | elapsed time per iteration (s): 0.42 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.259423E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.477 | TFLOPs: 31.72 | 7: iteration 113590/ 115203 | consumed samples: 29079040 | consumed tokens: 59553873920 | elapsed time per iteration (s): 0.43 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.186485E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.029 | TFLOPs: 31.17 | 7: iteration 113600/ 115203 | consumed samples: 29081600 | consumed tokens: 59559116800 | elapsed time per iteration (s): 0.43 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.212016E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.739 | TFLOPs: 31.31 | 7: iteration 113610/ 115203 | consumed samples: 29084160 | consumed tokens: 59564359680 | elapsed time per iteration (s): 0.42 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.209539E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.286 | TFLOPs: 32.02 | 7: iteration 113620/ 115203 | consumed samples: 29086720 | consumed tokens: 59569602560 | elapsed time per iteration (s): 0.43 | learning rate: 2.009E-05 | global batch size: 256 | lm loss: 2.176607E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.860 | TFLOPs: 31.42 | 7: iteration 113630/ 115203 | consumed samples: 29089280 | consumed tokens: 59574845440 | elapsed time per iteration (s): 0.42 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.233387E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.014 | TFLOPs: 31.74 | 7: iteration 113640/ 115203 | consumed samples: 29091840 | consumed tokens: 59580088320 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.227042E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.742 | TFLOPs: 31.00 | 7: iteration 113650/ 115203 | consumed samples: 29094400 | consumed tokens: 59585331200 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.227906E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.603 | TFLOPs: 31.57 | 7: iteration 113660/ 115203 | consumed samples: 29096960 | consumed tokens: 59590574080 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.216227E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.306 | TFLOPs: 30.92 | 7: iteration 113670/ 115203 | consumed samples: 29099520 | consumed tokens: 59595816960 | elapsed time per iteration (s): 0.42 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.244328E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.434 | TFLOPs: 31.71 | 7: iteration 113680/ 115203 | consumed samples: 29102080 | consumed tokens: 59601059840 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.229791E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.744 | TFLOPs: 31.36 | 7: iteration 113690/ 115203 | consumed samples: 29104640 | consumed tokens: 59606302720 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.224339E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.441 | TFLOPs: 31.45 | 7: iteration 113700/ 115203 | consumed samples: 29107200 | consumed tokens: 59611545600 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.229573E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.162 | TFLOPs: 31.38 | 7: iteration 113710/ 115203 | consumed samples: 29109760 | consumed tokens: 59616788480 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.208300E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.939 | TFLOPs: 31.32 | 7: iteration 113720/ 115203 | consumed samples: 29112320 | consumed tokens: 59622031360 | elapsed time per iteration (s): 0.43 | learning rate: 2.008E-05 | global batch size: 256 | lm loss: 2.187170E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.736 | TFLOPs: 30.99 | 7: iteration 113730/ 115203 | consumed samples: 29114880 | consumed tokens: 59627274240 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.209283E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.211 | TFLOPs: 31.44 | 7: iteration 113740/ 115203 | consumed samples: 29117440 | consumed tokens: 59632517120 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.210442E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.097 | TFLOPs: 31.59 | 7: iteration 113750/ 115203 | consumed samples: 29120000 | consumed tokens: 59637760000 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.184603E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.296 | TFLOPs: 31.55 | 7: iteration 113760/ 115203 | consumed samples: 29122560 | consumed tokens: 59643002880 | elapsed time per iteration (s): 0.42 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.207129E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.972 | TFLOPs: 31.74 | 7: iteration 113770/ 115203 | consumed samples: 29125120 | consumed tokens: 59648245760 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.213587E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.260 | TFLOPs: 31.44 | 7: iteration 113780/ 115203 | consumed samples: 29127680 | consumed tokens: 59653488640 | elapsed time per iteration (s): 0.42 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.219159E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.330 | TFLOPs: 31.71 | 7: iteration 113790/ 115203 | consumed samples: 29130240 | consumed tokens: 59658731520 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.228418E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.379 | TFLOPs: 31.34 | 7: iteration 113800/ 115203 | consumed samples: 29132800 | consumed tokens: 59663974400 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.198374E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.021 | TFLOPs: 31.27 | 7: iteration 113810/ 115203 | consumed samples: 29135360 | consumed tokens: 59669217280 | elapsed time per iteration (s): 0.43 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.199155E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.881 | TFLOPs: 31.42 | 7: iteration 113820/ 115203 | consumed samples: 29137920 | consumed tokens: 59674460160 | elapsed time per iteration (s): 0.42 | learning rate: 2.007E-05 | global batch size: 256 | lm loss: 2.193525E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 608.936 | TFLOPs: 31.95 | 7: iteration 113830/ 115203 | consumed samples: 29140480 | consumed tokens: 59679703040 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.196048E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.507 | TFLOPs: 31.35 | 7: iteration 113840/ 115203 | consumed samples: 29143040 | consumed tokens: 59684945920 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.215902E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.086 | TFLOPs: 31.33 | 7: iteration 113850/ 115203 | consumed samples: 29145600 | consumed tokens: 59690188800 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.202840E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.471 | TFLOPs: 31.51 | 7: iteration 113860/ 115203 | consumed samples: 29148160 | consumed tokens: 59695431680 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.207092E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.036 | TFLOPs: 31.38 | 7: iteration 113870/ 115203 | consumed samples: 29150720 | consumed tokens: 59700674560 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.200601E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.912 | TFLOPs: 31.53 | 7: iteration 113880/ 115203 | consumed samples: 29153280 | consumed tokens: 59705917440 | elapsed time per iteration (s): 0.45 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.198507E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.747 | TFLOPs: 29.68 | 7: iteration 113890/ 115203 | consumed samples: 29155840 | consumed tokens: 59711160320 | elapsed time per iteration (s): 0.42 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.227073E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.769 | TFLOPs: 32.10 | 7: iteration 113900/ 115203 | consumed samples: 29158400 | consumed tokens: 59716403200 | elapsed time per iteration (s): 0.45 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.221867E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.632 | TFLOPs: 29.84 | 7: iteration 113910/ 115203 | consumed samples: 29160960 | consumed tokens: 59721646080 | elapsed time per iteration (s): 0.42 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.227539E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.079 | TFLOPs: 31.80 | 7: iteration 113920/ 115203 | consumed samples: 29163520 | consumed tokens: 59726888960 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.221467E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.691 | TFLOPs: 30.94 | 7: iteration 113930/ 115203 | consumed samples: 29166080 | consumed tokens: 59732131840 | elapsed time per iteration (s): 0.43 | learning rate: 2.006E-05 | global batch size: 256 | lm loss: 2.199335E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.676 | TFLOPs: 30.94 | 7: iteration 113940/ 115203 | consumed samples: 29168640 | consumed tokens: 59737374720 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.243179E+00 | grad norm: 0.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.555 | TFLOPs: 31.62 | 7: iteration 113950/ 115203 | consumed samples: 29171200 | consumed tokens: 59742617600 | elapsed time per iteration (s): 0.43 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.235358E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.709 | TFLOPs: 31.52 | 7: iteration 113960/ 115203 | consumed samples: 29173760 | consumed tokens: 59747860480 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.190480E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.164 | TFLOPs: 31.80 | 7: iteration 113970/ 115203 | consumed samples: 29176320 | consumed tokens: 59753103360 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.222436E+00 | grad norm: 0.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 611.414 | TFLOPs: 32.08 | 7: iteration 113980/ 115203 | consumed samples: 29178880 | consumed tokens: 59758346240 | elapsed time per iteration (s): 0.43 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.171129E+00 | grad norm: 0.293 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.040 | TFLOPs: 31.12 | 7: iteration 113990/ 115203 | consumed samples: 29181440 | consumed tokens: 59763589120 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.239761E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.870 | TFLOPs: 31.74 | 0: [2022-11-29 02:43:05,581] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=0, lr=[2.004947884324412e-05, 2.004947884324412e-05, 2.004947884324412e-05], mom=[(0.9, 0.999), (0.9, 0.999), (0.9, 0.999)] 7: iteration 114000/ 115203 | consumed samples: 29184000 | consumed tokens: 59768832000 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.230076E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.367 | TFLOPs: 31.87 | 0: steps: 114000 loss: 2.2384 iter time (s): 0.431 samples/sec: 594.340 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 114000 | lm loss value: 2.176398E+00 | lm loss PPL: 8.814497E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 114000 to checkpoints_221m 0: [2022-11-29 02:43:05,741] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step114000 is begin to save! 0: [2022-11-29 02:43:05,744] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:43:05,855] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:43:05,856] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:43:05,877] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:43:05,878] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:43:05,900] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:43:05,900] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:43:05,924] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:43:05,924] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:43:05,949] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:43:05,949] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:43:05,974] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:43:05,974] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:43:05,998] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:43:05,998] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:43:06,021] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:43:06,022] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:43:06,044] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:43:06,045] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:43:06,069] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:43:06,069] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:43:06,093] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:43:06,093] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:43:06,121] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:43:06,121] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:43:06,160] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:43:06,160] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:43:06,184] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:43:06,184] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:43:06,207] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:43:06,207] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:43:06,231] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:43:06,231] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:43:06,254] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:43:06,254] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:43:06,278] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:43:06,278] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:43:06,302] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:43:06,302] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:43:06,305] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:43:06,306] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step114000/mp_rank_00_model_states.pt 0: [2022-11-29 02:43:06,306] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:43:06,308] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:43:06,326] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:43:06,327] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:43:06,372] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,372] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,372] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,373] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,373] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,373] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,375] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,375] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,375] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,376] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,376] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,376] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,377] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,377] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,378] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,378] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,378] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,379] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,379] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,379] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,382] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,382] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,383] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,383] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,384] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,384] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,385] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,385] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,385] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,380] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,381] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,381] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,386] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,386] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,387] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,387] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 1: [2022-11-29 02:43:06,388] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 5: [2022-11-29 02:43:06,388] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:43:06,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 7: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:43:06,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 4: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:43:06,389] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:43:06,389] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 6: [2022-11-29 02:43:06,391] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:43:06,391] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:43:06,391] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 2: [2022-11-29 02:43:06,394] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:43:06,394] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:43:06,394] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,392] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,392] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,392] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:43:06,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,393] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 3: [2022-11-29 02:43:06,393] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: [2022-11-29 02:43:06,448] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step114000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:43:06,448] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step114000 is ready now! 0: successfully saved checkpoint at iteration 114000 to checkpoints_221m 7: time (ms) | save-checkpoint: 716.47 7: iteration 114010/ 115203 | consumed samples: 29186560 | consumed tokens: 59774074880 | elapsed time per iteration (s): 0.51 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.212502E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 500.978 | TFLOPs: 26.29 | 7: iteration 114020/ 115203 | consumed samples: 29189120 | consumed tokens: 59779317760 | elapsed time per iteration (s): 0.43 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.204557E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.639 | TFLOPs: 31.46 | 7: iteration 114030/ 115203 | consumed samples: 29191680 | consumed tokens: 59784560640 | elapsed time per iteration (s): 0.43 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.225571E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.327 | TFLOPs: 31.50 | 7: iteration 114040/ 115203 | consumed samples: 29194240 | consumed tokens: 59789803520 | elapsed time per iteration (s): 0.42 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.211868E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.592 | TFLOPs: 32.04 | 7: iteration 114050/ 115203 | consumed samples: 29196800 | consumed tokens: 59795046400 | elapsed time per iteration (s): 0.43 | learning rate: 2.005E-05 | global batch size: 256 | lm loss: 2.228139E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.533 | TFLOPs: 31.35 | 7: iteration 114060/ 115203 | consumed samples: 29199360 | consumed tokens: 59800289280 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.198085E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.881 | TFLOPs: 31.79 | 7: iteration 114070/ 115203 | consumed samples: 29201920 | consumed tokens: 59805532160 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.210945E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.718 | TFLOPs: 31.05 | 7: iteration 114080/ 115203 | consumed samples: 29204480 | consumed tokens: 59810775040 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.200256E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.317 | TFLOPs: 31.71 | 7: iteration 114090/ 115203 | consumed samples: 29207040 | consumed tokens: 59816017920 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.172727E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.436 | TFLOPs: 31.24 | 7: iteration 114100/ 115203 | consumed samples: 29209600 | consumed tokens: 59821260800 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.240170E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.864 | TFLOPs: 31.00 | 7: iteration 114110/ 115203 | consumed samples: 29212160 | consumed tokens: 59826503680 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.227759E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 615.434 | TFLOPs: 32.29 | 7: iteration 114120/ 115203 | consumed samples: 29214720 | consumed tokens: 59831746560 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.233566E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.304 | TFLOPs: 31.39 | 7: iteration 114130/ 115203 | consumed samples: 29217280 | consumed tokens: 59836989440 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.211440E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 607.537 | TFLOPs: 31.88 | 7: iteration 114140/ 115203 | consumed samples: 29219840 | consumed tokens: 59842232320 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.223911E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.645 | TFLOPs: 31.30 | 7: iteration 114150/ 115203 | consumed samples: 29222400 | consumed tokens: 59847475200 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.212326E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.478 | TFLOPs: 31.40 | 7: iteration 114160/ 115203 | consumed samples: 29224960 | consumed tokens: 59852718080 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.225460E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.074 | TFLOPs: 31.64 | 7: iteration 114170/ 115203 | consumed samples: 29227520 | consumed tokens: 59857960960 | elapsed time per iteration (s): 0.45 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.198742E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 574.365 | TFLOPs: 30.14 | 7: iteration 114180/ 115203 | consumed samples: 29230080 | consumed tokens: 59863203840 | elapsed time per iteration (s): 0.42 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.228402E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.807 | TFLOPs: 31.63 | 7: iteration 114190/ 115203 | consumed samples: 29232640 | consumed tokens: 59868446720 | elapsed time per iteration (s): 0.43 | learning rate: 2.004E-05 | global batch size: 256 | lm loss: 2.239462E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.592 | TFLOPs: 30.93 | 7: iteration 114200/ 115203 | consumed samples: 29235200 | consumed tokens: 59873689600 | elapsed time per iteration (s): 0.45 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.208494E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.785 | TFLOPs: 29.95 | 7: iteration 114210/ 115203 | consumed samples: 29237760 | consumed tokens: 59878932480 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.200112E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.291 | TFLOPs: 30.97 | 7: iteration 114220/ 115203 | consumed samples: 29240320 | consumed tokens: 59884175360 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.210064E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.738 | TFLOPs: 31.05 | 7: iteration 114230/ 115203 | consumed samples: 29242880 | consumed tokens: 59889418240 | elapsed time per iteration (s): 0.45 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.239477E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.668 | TFLOPs: 29.84 | 7: iteration 114240/ 115203 | consumed samples: 29245440 | consumed tokens: 59894661120 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.225737E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.209 | TFLOPs: 31.49 | 7: iteration 114250/ 115203 | consumed samples: 29248000 | consumed tokens: 59899904000 | elapsed time per iteration (s): 0.44 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.198823E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.515 | TFLOPs: 30.83 | 7: iteration 114260/ 115203 | consumed samples: 29250560 | consumed tokens: 59905146880 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.186087E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.863 | TFLOPs: 31.11 | 7: iteration 114270/ 115203 | consumed samples: 29253120 | consumed tokens: 59910389760 | elapsed time per iteration (s): 0.44 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.241360E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.596 | TFLOPs: 30.78 | 7: iteration 114280/ 115203 | consumed samples: 29255680 | consumed tokens: 59915632640 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.210329E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.616 | TFLOPs: 31.20 | 7: iteration 114290/ 115203 | consumed samples: 29258240 | consumed tokens: 59920875520 | elapsed time per iteration (s): 0.44 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.215783E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.104 | TFLOPs: 30.33 | 7: iteration 114300/ 115203 | consumed samples: 29260800 | consumed tokens: 59926118400 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.227033E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.748 | TFLOPs: 31.10 | 7: iteration 114310/ 115203 | consumed samples: 29263360 | consumed tokens: 59931361280 | elapsed time per iteration (s): 0.42 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.227419E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.328 | TFLOPs: 31.71 | 7: iteration 114320/ 115203 | consumed samples: 29265920 | consumed tokens: 59936604160 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.190870E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.541 | TFLOPs: 31.04 | 7: iteration 114330/ 115203 | consumed samples: 29268480 | consumed tokens: 59941847040 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.210548E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.410 | TFLOPs: 30.98 | 7: iteration 114340/ 115203 | consumed samples: 29271040 | consumed tokens: 59947089920 | elapsed time per iteration (s): 0.43 | learning rate: 2.003E-05 | global batch size: 256 | lm loss: 2.262483E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.445 | TFLOPs: 31.40 | 7: iteration 114350/ 115203 | consumed samples: 29273600 | consumed tokens: 59952332800 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.195696E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.086 | TFLOPs: 31.22 | 7: iteration 114360/ 115203 | consumed samples: 29276160 | consumed tokens: 59957575680 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.247000E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.313 | TFLOPs: 31.24 | 7: iteration 114370/ 115203 | consumed samples: 29278720 | consumed tokens: 59962818560 | elapsed time per iteration (s): 0.44 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.206435E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 584.472 | TFLOPs: 30.67 | 7: iteration 114380/ 115203 | consumed samples: 29281280 | consumed tokens: 59968061440 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.196803E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.907 | TFLOPs: 30.95 | 7: iteration 114390/ 115203 | consumed samples: 29283840 | consumed tokens: 59973304320 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.210513E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.819 | TFLOPs: 31.21 | 7: iteration 114400/ 115203 | consumed samples: 29286400 | consumed tokens: 59978547200 | elapsed time per iteration (s): 0.45 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.197202E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 570.302 | TFLOPs: 29.92 | 7: iteration 114410/ 115203 | consumed samples: 29288960 | consumed tokens: 59983790080 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.227883E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.899 | TFLOPs: 31.27 | 7: iteration 114420/ 115203 | consumed samples: 29291520 | consumed tokens: 59989032960 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.214837E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.917 | TFLOPs: 31.32 | 7: iteration 114430/ 115203 | consumed samples: 29294080 | consumed tokens: 59994275840 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.192164E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.353 | TFLOPs: 30.97 | 7: iteration 114440/ 115203 | consumed samples: 29296640 | consumed tokens: 59999518720 | elapsed time per iteration (s): 0.44 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.199972E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.952 | TFLOPs: 30.85 | 7: iteration 114450/ 115203 | consumed samples: 29299200 | consumed tokens: 60004761600 | elapsed time per iteration (s): 0.44 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.220042E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.943 | TFLOPs: 30.80 | 7: iteration 114460/ 115203 | consumed samples: 29301760 | consumed tokens: 60010004480 | elapsed time per iteration (s): 0.42 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.199856E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.308 | TFLOPs: 31.65 | 7: iteration 114470/ 115203 | consumed samples: 29304320 | consumed tokens: 60015247360 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.215765E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.151 | TFLOPs: 31.49 | 7: iteration 114480/ 115203 | consumed samples: 29306880 | consumed tokens: 60020490240 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.210739E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.266 | TFLOPs: 31.08 | 7: iteration 114490/ 115203 | consumed samples: 29309440 | consumed tokens: 60025733120 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.237411E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.786 | TFLOPs: 31.10 | 7: iteration 114500/ 115203 | consumed samples: 29312000 | consumed tokens: 60030976000 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.201981E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.695 | TFLOPs: 31.31 | 7: iteration 114510/ 115203 | consumed samples: 29314560 | consumed tokens: 60036218880 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.185784E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.270 | TFLOPs: 31.08 | 7: iteration 114520/ 115203 | consumed samples: 29317120 | consumed tokens: 60041461760 | elapsed time per iteration (s): 0.44 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.242945E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.305 | TFLOPs: 30.81 | 7: iteration 114530/ 115203 | consumed samples: 29319680 | consumed tokens: 60046704640 | elapsed time per iteration (s): 0.42 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.224035E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 604.403 | TFLOPs: 31.71 | 7: iteration 114540/ 115203 | consumed samples: 29322240 | consumed tokens: 60051947520 | elapsed time per iteration (s): 0.43 | learning rate: 2.002E-05 | global batch size: 256 | lm loss: 2.205192E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.628 | TFLOPs: 31.36 | 7: iteration 114550/ 115203 | consumed samples: 29324800 | consumed tokens: 60057190400 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.260954E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.975 | TFLOPs: 31.48 | 7: iteration 114560/ 115203 | consumed samples: 29327360 | consumed tokens: 60062433280 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.184646E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.729 | TFLOPs: 31.10 | 7: iteration 114570/ 115203 | consumed samples: 29329920 | consumed tokens: 60067676160 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.240961E+00 | grad norm: 0.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.039 | TFLOPs: 31.17 | 7: iteration 114580/ 115203 | consumed samples: 29332480 | consumed tokens: 60072919040 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.238133E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.831 | TFLOPs: 31.00 | 7: iteration 114590/ 115203 | consumed samples: 29335040 | consumed tokens: 60078161920 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.208375E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.233 | TFLOPs: 31.18 | 7: iteration 114600/ 115203 | consumed samples: 29337600 | consumed tokens: 60083404800 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.243464E+00 | grad norm: 0.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.800 | TFLOPs: 31.47 | 7: iteration 114610/ 115203 | consumed samples: 29340160 | consumed tokens: 60088647680 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.221263E+00 | grad norm: 0.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.944 | TFLOPs: 31.01 | 7: iteration 114620/ 115203 | consumed samples: 29342720 | consumed tokens: 60093890560 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.225514E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.389 | TFLOPs: 31.45 | 7: iteration 114630/ 115203 | consumed samples: 29345280 | consumed tokens: 60099133440 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.220306E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 600.121 | TFLOPs: 31.49 | 7: iteration 114640/ 115203 | consumed samples: 29347840 | consumed tokens: 60104376320 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.188330E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.076 | TFLOPs: 31.43 | 7: iteration 114650/ 115203 | consumed samples: 29350400 | consumed tokens: 60109619200 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.229199E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.518 | TFLOPs: 31.56 | 7: iteration 114660/ 115203 | consumed samples: 29352960 | consumed tokens: 60114862080 | elapsed time per iteration (s): 0.44 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.197706E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.514 | TFLOPs: 30.35 | 7: iteration 114670/ 115203 | consumed samples: 29355520 | consumed tokens: 60120104960 | elapsed time per iteration (s): 0.42 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.192140E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 603.968 | TFLOPs: 31.69 | 7: iteration 114680/ 115203 | consumed samples: 29358080 | consumed tokens: 60125347840 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.269212E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.532 | TFLOPs: 31.19 | 7: iteration 114690/ 115203 | consumed samples: 29360640 | consumed tokens: 60130590720 | elapsed time per iteration (s): 0.42 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.211830E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.735 | TFLOPs: 31.62 | 7: iteration 114700/ 115203 | consumed samples: 29363200 | consumed tokens: 60135833600 | elapsed time per iteration (s): 0.42 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.235482E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.845 | TFLOPs: 32.00 | 7: iteration 114710/ 115203 | consumed samples: 29365760 | consumed tokens: 60141076480 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.234753E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.091 | TFLOPs: 30.91 | 7: iteration 114720/ 115203 | consumed samples: 29368320 | consumed tokens: 60146319360 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.219705E+00 | grad norm: 0.271 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.059 | TFLOPs: 30.96 | 7: iteration 114730/ 115203 | consumed samples: 29370880 | consumed tokens: 60151562240 | elapsed time per iteration (s): 0.42 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.239630E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 609.760 | TFLOPs: 31.99 | 7: iteration 114740/ 115203 | consumed samples: 29373440 | consumed tokens: 60156805120 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.211079E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.362 | TFLOPs: 31.55 | 7: iteration 114750/ 115203 | consumed samples: 29376000 | consumed tokens: 60162048000 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.221425E+00 | grad norm: 0.254 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 599.570 | TFLOPs: 31.46 | 7: iteration 114760/ 115203 | consumed samples: 29378560 | consumed tokens: 60167290880 | elapsed time per iteration (s): 0.45 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.188595E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 565.583 | TFLOPs: 29.68 | 7: iteration 114770/ 115203 | consumed samples: 29381120 | consumed tokens: 60172533760 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.199609E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.196 | TFLOPs: 31.12 | 7: iteration 114780/ 115203 | consumed samples: 29383680 | consumed tokens: 60177776640 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.216828E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 597.345 | TFLOPs: 31.34 | 7: iteration 114790/ 115203 | consumed samples: 29386240 | consumed tokens: 60183019520 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.214564E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.318 | TFLOPs: 31.39 | 7: iteration 114800/ 115203 | consumed samples: 29388800 | consumed tokens: 60188262400 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.240055E+00 | grad norm: 0.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.872 | TFLOPs: 30.95 | 7: iteration 114810/ 115203 | consumed samples: 29391360 | consumed tokens: 60193505280 | elapsed time per iteration (s): 0.42 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.238136E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.989 | TFLOPs: 31.80 | 7: iteration 114820/ 115203 | consumed samples: 29393920 | consumed tokens: 60198748160 | elapsed time per iteration (s): 0.43 | learning rate: 2.001E-05 | global batch size: 256 | lm loss: 2.206452E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.426 | TFLOPs: 31.14 | 7: iteration 114830/ 115203 | consumed samples: 29396480 | consumed tokens: 60203991040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.220769E+00 | grad norm: 0.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.513 | TFLOPs: 31.14 | 7: iteration 114840/ 115203 | consumed samples: 29399040 | consumed tokens: 60209233920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.213722E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.126 | TFLOPs: 31.17 | 7: iteration 114850/ 115203 | consumed samples: 29401600 | consumed tokens: 60214476800 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.223717E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.399 | TFLOPs: 31.61 | 7: iteration 114860/ 115203 | consumed samples: 29404160 | consumed tokens: 60219719680 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.205913E+00 | grad norm: 0.262 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.522 | TFLOPs: 31.40 | 7: iteration 114870/ 115203 | consumed samples: 29406720 | consumed tokens: 60224962560 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.212892E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.998 | TFLOPs: 30.54 | 7: iteration 114880/ 115203 | consumed samples: 29409280 | consumed tokens: 60230205440 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.220710E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.145 | TFLOPs: 31.75 | 7: iteration 114890/ 115203 | consumed samples: 29411840 | consumed tokens: 60235448320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.217112E+00 | grad norm: 0.256 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.777 | TFLOPs: 31.31 | 7: iteration 114900/ 115203 | consumed samples: 29414400 | consumed tokens: 60240691200 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.209878E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.110 | TFLOPs: 31.01 | 7: iteration 114910/ 115203 | consumed samples: 29416960 | consumed tokens: 60245934080 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.205916E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 567.432 | TFLOPs: 29.77 | 7: iteration 114920/ 115203 | consumed samples: 29419520 | consumed tokens: 60251176960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.224407E+00 | grad norm: 0.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 596.580 | TFLOPs: 31.30 | 7: iteration 114930/ 115203 | consumed samples: 29422080 | consumed tokens: 60256419840 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.227871E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.869 | TFLOPs: 31.00 | 7: iteration 114940/ 115203 | consumed samples: 29424640 | consumed tokens: 60261662720 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.212495E+00 | grad norm: 0.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 605.543 | TFLOPs: 31.77 | 7: iteration 114950/ 115203 | consumed samples: 29427200 | consumed tokens: 60266905600 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.197898E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 585.272 | TFLOPs: 30.71 | 7: iteration 114960/ 115203 | consumed samples: 29429760 | consumed tokens: 60272148480 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.222827E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 601.818 | TFLOPs: 31.58 | 7: iteration 114970/ 115203 | consumed samples: 29432320 | consumed tokens: 60277391360 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.229889E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.137 | TFLOPs: 30.81 | 7: iteration 114980/ 115203 | consumed samples: 29434880 | consumed tokens: 60282634240 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.198982E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 578.958 | TFLOPs: 30.38 | 7: iteration 114990/ 115203 | consumed samples: 29437440 | consumed tokens: 60287877120 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.185206E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 568.679 | TFLOPs: 29.84 | 7: iteration 115000/ 115203 | consumed samples: 29440000 | consumed tokens: 60293120000 | elapsed time per iteration (s): 0.45 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.204234E+00 | grad norm: 0.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 563.178 | TFLOPs: 29.55 | 7: -------------------------------------------------------------------------------------------- 7: valid loss at iteration 115000 | lm loss value: 2.299996E+00 | lm loss PPL: 9.974144E+00 | 7: -------------------------------------------------------------------------------------------- 0: saving checkpoint at iteration 115000 to checkpoints_221m 0: [2022-11-29 02:50:17,911] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step115000 is begin to save! 0: [2022-11-29 02:50:17,917] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:50:18,028] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:50:18,029] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:50:18,050] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:50:18,051] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:50:18,077] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:50:18,077] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:50:18,105] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:50:18,105] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:50:18,130] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:50:18,130] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:50:18,159] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:50:18,159] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:50:18,187] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:50:18,187] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:50:18,211] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:50:18,212] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:50:18,236] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:50:18,236] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:50:18,259] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:50:18,259] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:50:18,283] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:50:18,283] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:50:18,307] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:50:18,308] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:50:18,331] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:50:18,331] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:50:18,355] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:50:18,355] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:50:18,380] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:50:18,380] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:50:18,405] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:50:18,405] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:50:18,428] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:50:18,429] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:50:18,452] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:50:18,452] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:50:18,476] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:50:18,476] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:50:18,480] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:50:18,481] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step115000/mp_rank_00_model_states.pt 0: [2022-11-29 02:50:18,481] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/mp_rank_00_model_states.pt... 0: [2022-11-29 02:50:18,484] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/mp_rank_00_model_states.pt. 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,503] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,504] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:50:18,552] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,552] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,552] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,554] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,554] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,554] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,559] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,555] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,555] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,555] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,560] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,560] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,561] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,561] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,563] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,563] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,563] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,564] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,564] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,565] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,565] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,566] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,566] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 3: [2022-11-29 02:50:18,567] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:50:18,567] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 3: [2022-11-29 02:50:18,567] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:50:18,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,568] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 6: [2022-11-29 02:50:18,568] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,578] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,579] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,579] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,580] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,580] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,580] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,581] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,581] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,582] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,582] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,582] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:50:18,584] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:50:18,584] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,590] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,590] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,591] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,591] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,591] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,592] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,592] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,592] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 7: [2022-11-29 02:50:18,593] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:50:18,593] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 7: [2022-11-29 02:50:18,593] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:50:18,595] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 2: [2022-11-29 02:50:18,595] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,574] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,574] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,574] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 5: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:50:18,575] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:50:18,575] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,611] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,611] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 4: [2022-11-29 02:50:18,617] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: [2022-11-29 02:50:18,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:50:18,649] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115000/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:50:18,649] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115000 is ready now! 0: successfully saved checkpoint at iteration 115000 to checkpoints_221m 7: time (ms) | save-checkpoint: 760.65 7: iteration 115010/ 115203 | consumed samples: 29442560 | consumed tokens: 60298362880 | elapsed time per iteration (s): 0.53 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.209840E+00 | grad norm: 0.255 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 481.360 | TFLOPs: 25.26 | 7: iteration 115020/ 115203 | consumed samples: 29445120 | consumed tokens: 60303605760 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.183171E+00 | grad norm: 0.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.296 | TFLOPs: 31.60 | 7: iteration 115030/ 115203 | consumed samples: 29447680 | consumed tokens: 60308848640 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.189654E+00 | grad norm: 0.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.316 | TFLOPs: 30.97 | 7: iteration 115040/ 115203 | consumed samples: 29450240 | consumed tokens: 60314091520 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.229385E+00 | grad norm: 0.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 586.926 | TFLOPs: 30.80 | 7: iteration 115050/ 115203 | consumed samples: 29452800 | consumed tokens: 60319334400 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.235461E+00 | grad norm: 0.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 589.090 | TFLOPs: 30.91 | 7: iteration 115060/ 115203 | consumed samples: 29455360 | consumed tokens: 60324577280 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.226165E+00 | grad norm: 0.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.258 | TFLOPs: 31.23 | 7: iteration 115070/ 115203 | consumed samples: 29457920 | consumed tokens: 60329820160 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.197649E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 587.757 | TFLOPs: 30.84 | 7: iteration 115080/ 115203 | consumed samples: 29460480 | consumed tokens: 60335063040 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.199105E+00 | grad norm: 0.270 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 594.140 | TFLOPs: 31.17 | 7: iteration 115090/ 115203 | consumed samples: 29463040 | consumed tokens: 60340305920 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.232110E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 590.336 | TFLOPs: 30.97 | 7: iteration 115100/ 115203 | consumed samples: 29465600 | consumed tokens: 60345548800 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.199193E+00 | grad norm: 0.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.939 | TFLOPs: 31.06 | 7: iteration 115110/ 115203 | consumed samples: 29468160 | consumed tokens: 60350791680 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.220947E+00 | grad norm: 0.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 591.111 | TFLOPs: 31.01 | 7: iteration 115120/ 115203 | consumed samples: 29470720 | consumed tokens: 60356034560 | elapsed time per iteration (s): 0.44 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.216999E+00 | grad norm: 0.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 581.246 | TFLOPs: 30.50 | 7: iteration 115130/ 115203 | consumed samples: 29473280 | consumed tokens: 60361277440 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.211322E+00 | grad norm: 0.251 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 598.284 | TFLOPs: 31.39 | 7: iteration 115140/ 115203 | consumed samples: 29475840 | consumed tokens: 60366520320 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.189643E+00 | grad norm: 0.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.804 | TFLOPs: 31.26 | 7: iteration 115150/ 115203 | consumed samples: 29478400 | consumed tokens: 60371763200 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.217063E+00 | grad norm: 0.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 606.902 | TFLOPs: 31.84 | 7: iteration 115160/ 115203 | consumed samples: 29480960 | consumed tokens: 60377006080 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.197178E+00 | grad norm: 0.267 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 610.469 | TFLOPs: 32.03 | 7: iteration 115170/ 115203 | consumed samples: 29483520 | consumed tokens: 60382248960 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.200606E+00 | grad norm: 0.259 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 593.356 | TFLOPs: 31.13 | 7: iteration 115180/ 115203 | consumed samples: 29486080 | consumed tokens: 60387491840 | elapsed time per iteration (s): 0.42 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.247454E+00 | grad norm: 0.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 602.466 | TFLOPs: 31.61 | 7: iteration 115190/ 115203 | consumed samples: 29488640 | consumed tokens: 60392734720 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.241530E+00 | grad norm: 0.257 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 592.078 | TFLOPs: 31.07 | 7: iteration 115200/ 115203 | consumed samples: 29491200 | consumed tokens: 60397977600 | elapsed time per iteration (s): 0.43 | learning rate: 2.000E-05 | global batch size: 256 | lm loss: 2.218091E+00 | grad norm: 0.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 595.061 | TFLOPs: 31.22 | 0: [after training is done] datetime: 2022-11-29 02:51:46 0: saving checkpoint at iteration 115203 to checkpoints_221m 7: ------------------------------------------------------------------------------------------------------------ 7: valid loss at the end of training for val data | lm loss value: 2.202617E+00 | lm loss PPL: 9.048660E+00 | 7: ------------------------------------------------------------------------------------------------------------ 0: [2022-11-29 02:51:46,293] [INFO] [logging.py:68:log_dist] [Rank 0] [Torch] Checkpoint global_step115203 is begin to save! 0: [2022-11-29 02:51:46,298] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_01-model_00-model_states.pt... 0: [2022-11-29 02:51:46,402] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_01-model_00-model_states.pt. 0: [2022-11-29 02:51:46,402] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_03-model_00-model_states.pt... 0: [2022-11-29 02:51:46,423] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_03-model_00-model_states.pt. 0: [2022-11-29 02:51:46,423] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_04-model_00-model_states.pt... 0: [2022-11-29 02:51:46,446] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_04-model_00-model_states.pt. 0: [2022-11-29 02:51:46,447] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_05-model_00-model_states.pt... 0: [2022-11-29 02:51:46,469] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_05-model_00-model_states.pt. 0: [2022-11-29 02:51:46,469] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_06-model_00-model_states.pt... 0: [2022-11-29 02:51:46,491] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_06-model_00-model_states.pt. 0: [2022-11-29 02:51:46,491] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_07-model_00-model_states.pt... 0: [2022-11-29 02:51:46,515] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_07-model_00-model_states.pt. 0: [2022-11-29 02:51:46,516] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_08-model_00-model_states.pt... 0: [2022-11-29 02:51:46,538] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_08-model_00-model_states.pt. 0: [2022-11-29 02:51:46,538] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_09-model_00-model_states.pt... 0: [2022-11-29 02:51:46,561] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_09-model_00-model_states.pt. 0: [2022-11-29 02:51:46,561] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_10-model_00-model_states.pt... 0: [2022-11-29 02:51:46,583] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_10-model_00-model_states.pt. 0: [2022-11-29 02:51:46,583] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_11-model_00-model_states.pt... 0: [2022-11-29 02:51:46,604] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_11-model_00-model_states.pt. 0: [2022-11-29 02:51:46,604] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_12-model_00-model_states.pt... 0: [2022-11-29 02:51:46,624] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_12-model_00-model_states.pt. 0: [2022-11-29 02:51:46,624] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_13-model_00-model_states.pt... 0: [2022-11-29 02:51:46,649] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_13-model_00-model_states.pt. 0: [2022-11-29 02:51:46,649] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_14-model_00-model_states.pt... 0: [2022-11-29 02:51:46,672] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_14-model_00-model_states.pt. 0: [2022-11-29 02:51:46,672] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_15-model_00-model_states.pt... 0: [2022-11-29 02:51:46,695] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_15-model_00-model_states.pt. 0: [2022-11-29 02:51:46,695] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_16-model_00-model_states.pt... 0: [2022-11-29 02:51:46,718] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_16-model_00-model_states.pt. 0: [2022-11-29 02:51:46,718] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_17-model_00-model_states.pt... 0: [2022-11-29 02:51:46,740] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_17-model_00-model_states.pt. 0: [2022-11-29 02:51:46,741] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_18-model_00-model_states.pt... 0: [2022-11-29 02:51:46,762] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_18-model_00-model_states.pt. 0: [2022-11-29 02:51:46,762] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_19-model_00-model_states.pt... 0: [2022-11-29 02:51:46,786] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_19-model_00-model_states.pt. 0: [2022-11-29 02:51:46,786] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_20-model_00-model_states.pt... 0: [2022-11-29 02:51:46,810] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_20-model_00-model_states.pt. 0: [2022-11-29 02:51:46,810] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/layer_22-model_00-model_states.pt... 0: [2022-11-29 02:51:46,814] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/layer_22-model_00-model_states.pt. 0: [2022-11-29 02:51:46,815] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: checkpoints_221m/global_step115203/mp_rank_00_model_states.pt 0: [2022-11-29 02:51:46,815] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/mp_rank_00_model_states.pt... 0: [2022-11-29 02:51:46,817] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/mp_rank_00_model_states.pt. 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt... 0: [2022-11-29 02:51:46,836] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt... 5: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt... 4: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt... 3: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt... 6: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt... 7: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt... 2: [2022-11-29 02:51:46,835] [INFO] [torch_checkpoint_engine.py:15:save] [Torch] Saving checkpoints_221m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt... 1: [2022-11-29 02:51:46,882] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,882] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_11_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,882] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,883] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,883] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_38_mp_rank_00_optim_states.pt 4: [2022-11-29 02:51:46,883] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_32_mp_rank_00_optim_states.pt 4: [2022-11-29 02:51:46,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,884] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,884] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,884] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_33_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_13_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_57_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_60_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_22_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_56_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_20_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,887] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_37_mp_rank_00_optim_states.pt 4: [2022-11-29 02:51:46,887] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_39_mp_rank_00_optim_states.pt 4: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_50_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_36_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_53_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_15_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,888] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_10_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,888] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_48_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_6_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,889] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_54_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,889] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_51_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_18_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_17_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,890] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_21_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,890] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_7_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_5_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_4_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,885] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_41_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,885] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,886] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_47_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,886] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,893] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,901] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,891] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt. 0: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,893] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_61_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_14_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_8_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_12_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,891] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_16_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_26_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,893] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,902] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_35_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_25_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt. 1: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,902] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_59_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_49_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 1: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_9_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,892] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_28_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt. 2: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_23_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,892] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,896] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_63_mp_rank_00_optim_states.pt 2: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt. 6: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_55_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_52_mp_rank_00_optim_states.pt 1: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,895] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_30_mp_rank_00_optim_states.pt 6: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 6: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,896] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_19_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,895] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 2: [2022-11-29 02:51:46,896] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_31_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_58_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt. 7: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_24_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_29_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,899] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt. 3: [2022-11-29 02:51:46,894] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_27_mp_rank_00_optim_states.pt 7: [2022-11-29 02:51:46,899] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_62_mp_rank_00_optim_states.pt 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 7: [2022-11-29 02:51:46,899] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 3: [2022-11-29 02:51:46,894] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_44_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_45_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_46_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,897] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_40_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,897] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,898] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt. 5: [2022-11-29 02:51:46,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_42_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,898] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_43_mp_rank_00_optim_states.pt 5: [2022-11-29 02:51:46,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 5: [2022-11-29 02:51:46,898] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 4: [2022-11-29 02:51:46,904] [INFO] [torch_checkpoint_engine.py:17:save] [Torch] Saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt. 4: [2022-11-29 02:51:46,904] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_34_mp_rank_00_optim_states.pt 4: [2022-11-29 02:51:46,904] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: [2022-11-29 02:51:46,953] [INFO] [engine.py:3213:_save_zero_checkpoint] bf16_zero checkpoint saved checkpoints_221m/global_step115203/bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt 0: [2022-11-29 02:51:46,953] [INFO] [torch_checkpoint_engine.py:27:commit] [Torch] Checkpoint global_step115203 is ready now! 0: successfully saved checkpoint at iteration 115203 to checkpoints_221m 7: ------------------------------------------------------------------------------------------------------------ 7: test loss at the end of training for test data | lm loss value: 2.145278E+00 | lm loss PPL: 8.544420E+00 | 7: ------------------------------------------------------------------------------------------------------------ END 2077145: Tue Nov 29 02:52:06 EET 2022